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I, Rodger Smith Ph.D., hereby declare and state as follows: 

1. I am currently employed as a Senior Scientist I in Lead Identification at 
CoGenesys, a division of Human Genome Sciences, Inc. I understand that Human 
Genome Sciences is the assignee of the above-captioned patent application. I earned my 
Ph.D. in 1989 from the Department of Microbiology at the University of Illinois, Urbana- 
Champaign, Illinois. It was during my thesis research that I first began work on cloning 
and sequencing of antibody genes. From 1990 to 1999, I worked as a Scientist in the 
Molecular Biology and Assay Development groups at IGEN International, Inc. (now 
known as Bioveris Corp.) where my primary responsibilities were developing and 
characterizing antibody reagents for therapeutic and diagnostic applications. A portion of 
this work entailed the design and construction of both human and mouse V-domain 
antibody repertoire libraries for display on the surface of bacteriophage including in 1997 
an SBIR grant sponsored by The Department of the Army to construct and validate a large 
semi-synthetic human phage antibody display library. In 2000, I joined the Antibody 
Development group at Human Genome Sciences where I have continued to work with 
phage antibody display technology, primarily for developing therapeutic antibodies to a 



Sir: 



variety of novel protein targets. A large portion of this work involved screening and 
characterizing hundreds of antibody leads at both the DNA sequence and protein level. I 
am the co-author of 12 scientific articles and several issued and pending patent 
applications. A copy of my curriculum vitae is attached as Exhibit H 1 . 

2. On December 14, 2004, I signed a Declaration (hereafter "the December 
2004 Declaration") explaining that an antibody scientist on or before June 16, 2000 would 
have been able to identify and correct errors present in Table 1 of the 09/880,748 
application (hereafter "the 6 748 Application"). The errors pertained to the delineation of 
the VL region of the amino acid sequence of certain scFvs disclosed in SEQ ID NOS:l- 
2128. I understand that the December 2004 Declaration was submitted to the United 
States Patent and Trademark Office (hereafter "Patent Office") on December 14, 2004 to 
support the permissibility of correcting the errors in delineating the VL region in column 3 
of Table 1 of the '748 application. I understand the Patent Office has not allowed the 
corrections to Table 1 because the Patent Office was not convinced that an antibody 
scientist would have been able to both recognize and correct the errors in Table 1 of the 
'748 application as of June 16, 2000. 

3. I have read and examined the Examiner's comments relating to the 
December 2004 Declaration as set forth in the communication from the Patent Office 
mailed May 5, 2005 on pages 5-11 under the heading "New Objections or Rejections 
based on Amendment" (hereafter "the comments mailed May 5, 2005"). I maintain that an 
antibody scientist, indeed, would have been able to both recognize and correct the errors in 
the delineation of the VL region in column 3 of Table 1 of the '748 application as of June 
16, 2000. Further, to convince the Patent Office of the accuracy of my position, I believe 
that certain points from the December 2004 Declaration warrant clarification. 



1 On December 14, 2004, 1 signed a Declaration under 37 C.F.R. 1.132 ("the December 2004 Declaration") 
which was submitted to the United States Patent and Trademark Office in connection with this case. The 
December 2004 Declaration was submitted with 7 Exhibits labeled Exhibits A-G. Copies of Exhibits that 
were previously submitted in conjunction with the December 2004 Declaration, and that are referred to in the 
present Declaration are attached hereto using the same Exhibit Letter as in the December 2004 Declaration. 
Exhibits newly submitted in conjunction with this Declaration are consecutively lettered from H-M. 
Although a copy of my curriculum vitae (CV) was submitted with the December 2004 Declaration as Exhibit 
A, I have revised my CV to reflect my transfer within HGS to the CoGenesys division since my signing of 
the December 2004 Declaration. My updated CV is submitted herewith as Exhibit H. 
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Summary of Previous Declaration 

4. In the December 2004 Declaration, I stated that an antibody scientist would 
both be able to identify and correct errors in the delineation of the VL domain in Table 1 
of the 09/880,748.. The premise of the December 2004 Declaration was that an antibody 
scientist can (and could have as of June 16, 2000) recognize and correct the errors in Table 
1 on the basis of an art accepted standard numbering system for the variable regions of 
kappa and lambda light chains. In paragraph 9 of the December 2004 Declaration, I stated: 

The beginning of the VL region in an scFv may be easily 
delineated by 1) determining whether the scFv contains a 
kappa or a lambda variable domain and then 2) calculating 
the first amino acid sequence based on a standard numbering 
system for immunoglobulin variable regions that was 
established by Elvin A. Kabat and Tai Te Wu in the 1970's 
that is widely used by immunologists even today. 

5. In the comments mailed May 5, 2005, concerns were raised by the Patent 
Office regarding both steps, i.e., the first step of recognizing if a light chain variable 
domain is a kappa variable domain or a lambda variable domain and the second step of 
applying the Kabat- Wu numbering system. Below, I make clarifications on both issues. 

Determining if the VL in the scFv is a Vk or a Vk 

6. Paragraphs 10-13 of the December 2004 Declaration explained that an 
antibody scientist could routinely determine if a VL domain was a Vk or a VX. In 
paragraph 13, 1 stated that one of skill in the art could accomplish this task by aligning the 
sequence of an scFv against a database containing known human germline genes and 
identifying whether the "closest germline" gene is a Vk or a VX. 

7. To clarify, the purpose of aligning a given VL domain to known germline 
genes is not to "identify the closest germline gene" per se. Rather, the point is to 
determine if the VL domain in question is most similar to Vk or VX variable regions 
because if a VL domain in question is most similar to Vk variable regions, it too is a Vk, 
whereas if a the VL domain in question is most similar to VX variable regions, it is a VX. 
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The ability to classify a VL gene as a either a Vk or VX via alignment to known VL 
sequences is based on the principle that Vks are more similar to other Vks than they are to 
VXs and vice versa. This principle is set forth in the 1993 textbook, Fundamental 
Immunology: 

Variable region sequences do not randomly differ relative to 
each other — even within the CDRs. Analyses of hundreds 
of V regions reveal that sequences naturally fall into a 
homology based hierarchy directly related to the 
germline antibody gene loci. Members within a 
hierarchical group are more similar to each other than to 
all sequences from other groups ; furthermore, similar 
sequences display a shared pattern of amino acid 
substitutions that serve as "membership badges" for the 
various classifications. Through examination of these 
linked-substitutions, an evolutionary history of V regions 
can be obtained. Of course, the oldest and most basic group 
is that of the V regions themselves, followed by the division 
into VH, Vk and VX representing separate V-gene loci on 
different chromosomes, (emphasis added, Fundamental 
Immunology, 3 rd edition, edited by William Paul, Raven 
Press, New York. 1993: p290, Exhibit I) 

8. Therefore, in view of the overarching principle that Vks are more similar to 
other Vks than they are to VXs and vice versa, an antibody scientist can, and could have 
used as June 1 6, 2000, use such routine alignments to classify a VL domain as either a Vk 
or VX. For this purpose, any database containing human light chain germline variable 
region genes could be used, not just the database listed in the paragraph [0669] of the '748 
application. Furthermore, for this purpose, it is not necessary to definitively establish the 
amino acid sequence of the actual germline gene. 

Kabat- Wu Numbering System 

9. Paragraphs 14-15 of the December 2004 Declaration explained that an art 
accepted numbering system for immunological variable domains developed by Elvin A. 
Kabat and Tai Te Wu in the 1970s could be used to identify the first amino acid residue in 
a VL domain. 



10. In brief, the Kabat- Wu numbering system recognizes the presence of an 
invariant cysteine residue at position 23 of immunoglobulin light chain variable regions. 
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The Kabat-Wu numbering system also acknowledges that lambda light chains have a 
deletion at Kabat-Wu position 10, so the invariant cysteine residue at Kabat-Wu's position 
23 in lambda light chains is actually the 22 nd amino acid residue in the VX domain. At 
paragraph 15 of the December 2004 Declaration, I stated that on the basis of this 
information, an antibody scientist could identify amino acid residue number 1 in a VL 
domain by identifying the invariant cysteine residue at Kabat-Wu position 23, assigning 
that cysteine as amino acid residue number 23 or 22 depending on whether the VL domain 
was a Vk or VX, respectively, and counting backwards to amino acid residue 1 to identify 
the first amino acid residue of the VL region. 

11. In the December 14, 2004 Declaration, I supplied a copy of Table 1 from 
the Introduction of the fifth edition of Sequences of Proteins of Immunological Interest as 
documentation of the Kabat-Wu numbering system (Exhibit G). Table 1 from Sequences 
of Proteins of Immunological Interest indicates that there is an "occasional" amino acid 
residue at position number 0 in immunoglobulin light chains. In the comments mailed 
May 5, 2005, the Patent Office took the position that the amino acid residues that were 
being excluded from the delineation of the VL region in Table 1 of the '748 application by 
virtue of the requested corrections could be construed to be examples of VL regions which 
had an amino acid residue at Kabat-Wu position 0. 

12. I disagree with this position taken by the Patent Office. I do not recall ever 
working with a human light chain variable region that contained an amino acid residue at 
position 0, nor would I expect to work with one because human germline variable region 
sequences do not contain an amino acid at position number 0 (e.g., see Exhibits C and D). 
Based on my experience, I would characterize the human light chain variable region with 
an amino acid residue at position number 0 as "extremely rare." 

13. Unfortunately, the Sequences of Proteins of Immunological Interest does 
not further define the meaning of the "occasional" amino acid residue at position 0 in 
terms of a frequency of occurrence. Nonetheless, given the sheer volume of known 
immunoglobulin VL region sequences, the frequency of human VL domains with an 
amino acid residue at position 0 can be better defined empirically as set forth below. 
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Frequency of human VL domains with an amino acid residue at position 0 

14. In addition to establishing a standard numbering system for 
immunoglobulin variable region sequences, Kabat-Wu also established a database that 
contains thousands of examples of immunoglobulin sequences (See Exhibit F for a brief 
description of this database). 

15. For the exercise described below, an online database, known as the 
KabatMan database was used. This database is available at the website of Dr. Andrew 
Martin, a bioinformaticist and professor at the University College of London. The URL 
for this website is: http://Avww.bioinf.org.uk/abs/simkab.html . The KabatMan database 
contains sequences from the July 12, 2000 version of the Kabat-Wu database. The 
contents of the KabatMan database are described on the website as follows, "Only 
immunoglobulin sequences are stored here; all T-cell receptor, MHC sequences, etc. are 
rejected. In addition, sequences with fewer than 75 residues are rejected so that the 
database contains only essentially complete light or heavy chain sequences." 

16. The KabatMan database was queried to provide a listing of all the human 
light chain sequences it contained. Prior to analysis, any sequences that were N-terminally 
truncated were removed from the dataset, as were any redundant amino acid sequences. 
The dataset analyzed contained 1,745 immunoglobulin light chain sequences. Of these, 
only 5/1745 or 0.29% contained an amino acid residue at position 0. These sequences are 
shown in Exhibit J. 

17. This 0.29% frequency was so low as to suggest that the sequences 
containing an amino acid residue at position number 0 were in error. Both the Kabat-Wu 
database and the KabatMan derivative of the Kabat-Wu database provide information as to 
the original source of each sequence contained in the databases. Moreover, it is suggested 
by the current curators of the Kabat database that "if there are doubts about these 
sequences or their annotations, please refer to the original papers" (Exhibit F, top of the 
right hand column on page 214). Because the frequency of the position 0 amino acid 
residue was so low in human VL domains as to suggest error, the original papers 
presenting these sequences were consulted. Exhibit J also lists the original citations 
associated with each sequence. 
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18. The 5 sequences identified in paragraph 16 above as containing a position 0 
amino acid residue are the HBL-2' CL, HBL-3' CL, MP9' CL, RPMI8226'CL and the 
MC116'CL sequences. These 5 sequences were originally published in three scientific 
articles. In each case, the presence of the amino acid residue at position 0 in the database 
sequence turned out to be an error. 

The HBL-2 ' CL and the HBL-3 ' CL sequences 

19. The HBL-2' CL and the HBL-3' CL sequences were originally presented in 
a 1994 article published in Blood by Riboldi et al. (submitted herewith as K). The 
alanine (A) amino acid residue at position 0 shown in the sequence in the KabatMan 
database, is simply not present in the original sequence (see Figure 2B). Nor is the 
position 0 alanine for these sequences present in the GenBank Reports for these 
sequences 2 . Thus, it is clear that these two examples of position 0 amino acids are simply 
inaccurate. 

The MP9 'CL sequence 

20. The MP9'CL sequence was originally presented in a 1995 article published 
in Molecular Immunology by Andris et al. (submitted herewith as Exhibit L). The aspartic 
acid (D) amino acid residue at position 0 shown in the sequence in the KabatMan database 
is present in the sequence in the original paper (see Figure 3C), but careful reading of the 
paper shows that this amino acid residue is actually encoded by the vector sequence into 
which the light chain was cloned prior to sequencing. Figure 3C indicates that the MP9 
variable region is a VX2 variable region. Table 2 indicates that the PCR primer used to 
amplify this sequence would have been the VLAM 2 primer and that this primer amplifies 
V lambda gene segments "from the beginning of framework 1." Reproduced below is the 
beginning of the MP9 sequence in which the VLAM2 primer sequence is underlined: 

GAT TCG TAT CAG CTG ACG CAG CCT C CC TCC... 
ESYQLTQPP S... 

The GAT codon encoding the position 0 amino acid codon is 5' of the PCR primer 
indicating that the source of this codon cannot be the light chain. Reading of the Materials 
and Methods section shows that the PCR products encoding the light chains were cloned 



Serial No. 09/880,748 



7 



PF523P1 



as blunt-ended fragments into the EcoRV restriction site of the Bluescript phagemid vector 
(See page 1108, section entitled "Isolation, cloning and sequencing of the amplified 
products", first sentence.) The EcoRV site is reproduced below with the cut site indicated 
by the forward slash mark. 

GAT/ATC 
CTA/TAG 

Thus, the position 0 aspartic acid residue in the MP9'CL light chain in the KabatMan 
database is the result of the erroneous inclusion of the 5' portion of the vector-derived 
EcoRV restriction site. 

The RPMI8226'CL and the MC116'CL sequences 
21. The RPMI8226'CL and the MC116'CL sequences were originally 
presented in a 1995 article published in The Scandinavian Journal of Immunology by 
Watkins et al. (submitted herewith as Exhibit M). The glutamic acid amino residue acid at 
position 0 shown in these sequences in the Kabat database is present in the sequence in the 
original paper (see Table 2), but careful reading of the paper shows that this amino acid 
residue is actually encoded by the PCR primer used to amplify the light chain from the 
RPMI8226 and MCI 16 cell lines. It is reported at page 446, top of the left hand column, 
that the light chains expressed by the MCI 16 and RPMI8226 cell lines are both VX2 light 
chains, but that these sequences were best amplified with a primer meant to amplify VX1 
light chains, primer 3 a shown in Table 1. The legend to Table 1 indicates that the 
underlined sequence in each primer is a restriction site that was inserted into the PCR 
primer for cloning purposes. Reproduced below is the beginning of the RPMI8226'CL 
and MC116'CL sequences as shown in Table 2 of the Watkins paper in which the 3a 
primer sequence is underlined and the inserted restriction site is shown in boldfaced text.: 

GAG CTC TCT GTG CTG ACT CAG CC T GCC.„ 
ELSVLTQP A 

Because the 5' end of each of these light chain sequences is the region to which the PCR 
primer annealed, the 5' end of this sequence is defined by the PCR primer and not the light 
chains found in the MCI 16 and RPMI8226 cell lines. Moreover, the glutamic acid amino 



2 Accession Numbers L291 13 and L291 14 are listed in the legend of Figure 2 of the Riboldi et al. reference 

as the GenBank Accession Numbers for the HBL-2' CL and the HBL-3' CL sequences. 
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acid residue at position 0 is encoded by the restriction site that was inserted into the PCR 
primer. Thus, the presence of the glutamic acid amino acid residue at position 0 in the 
RPMI8226'CL and the MC116'CL light chain sequences in the KabatMan database is an 
error. 

22. Accordingly, 0/1745 human light chain variable region sequences in the 
KabatMan database contain a verifiable position 0 amino acid residue. Thus, the presence 
of the "occasional" position 0 amino acid in human VL regions is, as my previous 
experience taught me, either extremely rare or non-existent. 

Correction of the Delineation ofVL regions 

23. I maintain that an antibody scientist as of June 16, 2000 would have been 
able to recognize and correct the errors in Table 1 of the '748 application relating to the 
delineation of the VL region in the scFvs of SEQ ID NOS:l-2128 on the basis of the 
Kabat-Wu numbering system. This is true whether the error was the inclusion of a few 
extra amino acids (usually A, AL of AF) or the omission of a serine (S) residue at the N 
terminal end of certain VL regions. 

24. With particular regard to the omitted serine residues, an antibody scientist 
would certainly have made this correction even though the omitted serine residue to be 
included is also the last residue of the (Gly4Ser) 3 linker sequence. An antibody scientist 
would make this correction on the basis of the Kabat-Wu numbering system established 
after careful analysis and alignment of numerous V region sequences and on the basis that 
it is accepted that human germline light chain variable regions have an amino acid residue 
at Kabat-Wu position l 3 . 



3 The information provided in paragraph 1 8 of the December 2004 Declaration describing how in 238/239 
instances where a correction to delineate the VL region as including the last serine of the (Gly 4 Ser) 3 linker 
sequence was requested, the closest identified germline gene was a V gene whose first amino acid residue 
was a serine was tangential information, meant by way of explanation and as a secondary proof that one 
would wish to include the residue. However, even in the case where the initial amino acid of the closest 
germline gene was not serine (i.e., SEQ ID NO: 1389), the correction would still be made by the antibody 
scientist as of June 16, 2000 to include the serine residue because of the mandates of the Kabat-Wu 
numbering system. 
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SUMMARY 



25. On or before June 16, 2000, an antibody scientist examining the 
information presented in Table 1 and the sequences of SEQ LD NOS:l-2128 of the 
Sequence Listing of the '748 application would have readily recognized that in several 
instances, the amino acid residues delineated in Table 1 as making up the VL region of 
certain scFvs were incorrect for either containing a few additional amino acids at the 
amino terminal end of the VL region or for lacking an amino acid at the amino terminal 
end of the VL-region. Moreover, on the basis of the Kabat-Wu numbering system, an 
antibody scientist would also have been able to correct the delineations of the VL regions 
in Table 1 the '748 application. The corrections an antibody scientist would have made, 
are the same as those that were requested in the '748 application on December 14, 2000. 

26. I hereby declare that all statements made herein of my own knowledge are 
true and that all statements made on information and belief are believed to be true; and 
further that these statements were made with the knowledge that willful false statements 
and the like so made are punishable by fine or imprisonment, or both, under § 1001 of 
Title 18 of the United States Code, and that such willful false statements may jeopardize 
the validity of the application captioned above or any patent issuing thereupon. 





Rodger Smith, Ph.D. 
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Although the human Ig kappa locus has not been fully sequenced yet, all of individual genes are likely 
to have been isolated ( Sellable KF and Zachau HG. 1993 : Brensing-Kuppers J. et aL 1997) . The 
following sequences are taken from these studies. 

Total number of sequences: 4 6 

>A1 

DVVMTQSPLSLPVTLGQPASISCRSSQSLVYSDGNTYLNWFQQRPGQSPRRLIYKVSNWD 
SGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQGTHWP 



>A10 

EIVLTQSPDFQSVTPKEKVTITCRASQSIGSSLHWYQQKPDQSPKLLIKYASQSFSGVPS 
RFSGSGSGTDFTLTINSLEAEDAATYYCHQSSSLP 



>A1 1 

EIVLTQSPATLSLSPGERATLSCGASQSVSSSYLAWYQQKPGLAPRLLIYDASSRATGIP 
DRFSGSGSGTDFTLTISRLEPEDFAVYYCQQYGSSP 



>A14 

DVVMTQSPAFLSVTPGEKVTITCQASEGIGNYLYWYQQKPDQAPKLLIKYASQSISGVPS 
RFSGSGSGTDFTFTI SSLEAEDAATYYCQQGNKHP 



>A17 

DVVMTQSPLSLPVTLGQPASISCRSSQSLVYSDGNTYLNWFQQRPGQSPRRIilYKVSNRD 
SGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQGTHWP 



>A13 

DIVMTQTPLSLSVTPGQPASISCKSSQSLLHSDGKTYLYWYLQKPGQSPQLLIYEVSSRF 
SGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQGIHLP 



>A19 

DIVMTQSPLSLPVTPGEPASISCRSSQSLLHSNGYNYLDWYLQKPGQSPQLLIYLGSNRA 
SGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQALQTP 



>A2 

DIVMTQTPLSLSVTPGQPASISCKSSQSLLHSDGKTYLYWYLQKPGQPPQLLIYEVSNRF 
SGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQSIQLP 



>A20 

DIQMTQSPSSLSASVGDRVTITCRASQGISNYLAWYQQKPGKVPKLLIYAASTLQSGVPS 
RFSGSGSGTDFTLTISSIiQPEDVATYYCQKYNSAP 



>A23 

DIVMTQTPLSSPVTLGQPASISCRSSQSLVHSDGNTYLSWLQQRPGQPPRLLIYKISNRF 
SGVPDRFSGSGAGTDFTLKISRVEAEDVGVYYCMQATQFP 



>A26 

EIVLTQSPDFQSVTPKEKVTITCRASQSIGSSLHWYQQKPDQSPKLLIKYASQSFSGVPS 
RFSGSGSGTDFTLTINSLEAEDAATYYCHQSSSLP 



http://www.ncbi.nIm.nih.gov/igblast/showGennlinexgi?organism=hu 12/9/2004 
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>A27 

EIVLTQSPGTLSLSPGERATLSCRASQSVSSSYLAWYQQKPGQAPRLLIYGASSRATGIP 
DRFSGSGSGTDFTLT I S RLEPEDFAVYYCQQYGS S P 



>A3 

DI VMTQS PLS L PVT PGE PAS I SCRS SQSLLHSNGYNYLDWYLQKPGQS PQLL I YLG SNRA 
SGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQALQTP 



>A30 

DIQMTQSPSSLSASVGDRVTITCRASQGIRNDLGWYQQKPGKAPKRLIYAASSLQSGVPS 
RFSGSGSGTEFTLTISSLQPEDFATYYCLQHNSYP 



>A5 

EIVMTQTPLSLSITPGEQASISCRSSQSLLHSDGYTYLYWFLQKARPVSTLLIYEVSNRF 
SGVPDRFSGSGSGTDFTLKISRVEAEDFGVYYCMQDAQDPP 



>A7 

DIVKTQTPLSSPVTLGQPASISFRSSQSLVHSDGNTYLSWLQQRPGQPPRLLIYKVSNRF 
SGVPDRFSGSGAGTDFTLKISRVEAEDVGVYYCTQATQFP 



>B2 

ETTLTQSPAFT^SATPGDKVNISCKASQDIDDDMNWYQQKPGEAAIFIIQEATTLVPGIPP 
RFSGSGYGTDFTLTINNIESEDAAYYFCLQHDNFP 



>B3 

DIVMTQSPDSIAVSLGERATINCKSSQSVLYSSNNKNYLAWYQQK^GQPPKLLIYW 
ESGVPDRFSGSGSGTDFTLTISSLQAEDVAVYYCQQYYSTP 



>L1 

DIQMTQSPSSLSASVGDRVTITCRASQGISNYLAWFQQKPGKAPKSLIYAASSLQSGVPS 
RFSGSGSGTDFTLTI SSLQPEDFATYYCQQYNSYP 



>L10 

EIVMTQSPPTLSLSPGERVTLSCRASQSVSSSYLTWYQQKPGQAPRLLIYGASTRATSIP 
ARFSGSGSGTDFTLTISSLQPEDFAVYYCQQDHNLPP 



>L11 

AIQMTQSPSSLSASVGDRVTITCRASQGIRNDLGWYQQKPGKAPKLLIYAASSLQSGVPS 
RFSGSGSGTDFTLTI SSLQPEDFATYYCLQDYNYP 



>L12 

DIQMTQSPSTLSASVGDRVTITCRASQSISSWLAWYQQKPGKAPKLLIYDASSLESGVPS 
RFSGSGSGTEFTLTISSLQPDDFATYYCQQYNSYS 



>L14 

NIQMTQSPSAMSASVGDRVTITCRARQGISNYLAWFQQKPGKVPKHLIYAASSLQSGVPS 
RFSGSGSGTEFTLTISSLQPEDFATYYCLQHNSYP 



http://www.ncbi.nlm.nih.go 12/9/2004 
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>L15 

DIQMTQSPSSLSASVGDRVTITCRASQGISSWLAWYQQKPEKAPKSLIYAASSLQSGVPS 
RFSGSGSGTDFTLTISSLQPEDFATYYCQQYNSYP 



>L16 

EIVMTQSPATLSVSPGERATLSCRASQSVSSNLAWYQQKPGQAPRLLIYGASTRATGIPA 
RFSGSGSGTEFTLTI SSLQSEDFAVYYCQQYNNWP 



>L18 

AIQLTQSPSSLSASVGDRVTITCRASQGISSALAWYQQKPGKAPKLLIYDASSLESGVPS 
RFSGSGSGTDFTLTI SSLQPEDFATYYCQQFNNYP 



>L19 

DIQMTQSPSSVSASVGDRVTITCRASQGISSWLAWYQQKPGKAPKLLIYAASSLQSGVPS 
RFSGSGSGTDFTLT I S SLQPEDFATYYCQQANS FP 



>L2 

EIVMTQSPATLSVSPGERATLSCRASQSVSSNLAWYQQKPGQAPRLLIYGASTRATGIPA 
RFSGSGSGTEFTLTI SSLQSEDFAVYYCQQYNNWP 



>L20 

E I VLTQS PATLSLSPGERATLSCRASQGVS SYLAWYQQKPGQAPRLLI YDASNRATGI PA 
RFSGSGPGTDFTLTI SSLEPEDFAVYYCQQRSNWH 



>L22 

DIQMIQSPSFLSASVGDRVSIICWASEGISSNLAWYLQKPGKSPKLFLYDAKDLHPGVSS 
RFSGRGSGTDFTLTI I SLKPEDFAAYYCKQDFSYPP 



>L23 

AI RMTQS PFSLS ASVGDRVT I TCWASQG I S S YLAWYQQKPAKAPKLF I YYASSLQSGVPS 
RFSGSGSGTDYTLTI SSLQPEDFATYYCQQYYSTP 



>L24 



VIWMTQSPSLLSASTGDRVTISCRMSQGISSYLAWYQQKPGKAPELLIYAASTLQSGVPS 
RFSGSGS GTDFTLT I S CLQS EDF ATYYCQQYYS F P 



>L25 

EIVMTQSPATLSLSPGERATLSCRASQSVSSSYLSWYQQKPGQAPRLLIYGASTRATGIP 
ARFSGSGSGTDFTLTISSLQPEDFAVYYCQQDYNLP 



>L4/18a 

AIQLTQSPSSLSASVGDRVTITCRASQGISSALAVfYQQKPGKAPKLLIYDASSLESGVPS 
RFSGSGSGTDFTLT I SSLQPEDFATYYCQQFNSYP 



>L5 

DIQMTQSPSSVSASVGDRVTITCRASQGISSWLAWYQQKPGKAPKLLIYAASSLQSGVPS 
RFSGSGSGTDFTLTI SSLQPEDFATYYCQQANSFP 
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>L6 

EIVLTQSPATLSLSPGERATLSCRASQSVSSYLAWYQQKPGQAPRLLIYDASNRATGIPA 
RFSGSGSGTDFTLTISSLEPEDFAVYYCQQRSNWP 



>L8 

DIQLTQSPSFLSASVGDRVTITCRASQGISSYLAWYQQKPGKAPKLLIYAASTLQSGVPS 
RFSGSGSGTEFTLTISSLQPEDFATYYCQQLNSYP 



>L9 

AIRMTQSPSSFSASTGDRVTITCRASQGISSYLAWYQQKPGKAPKLLIYAASTLQSGVPS 
RFSGSGSGTDFTLTISCLQSEDFATYYCQQYYSYP 



>or 

DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSDDGNTYLDWYLQKPGQSPQLLIYTLSYR 
ASGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQRIEFP 



>011 

DIVMTQTPLSLPVTPGEPASISCRSSQSLLDSDDGNTYLDWYLQKPGQSPQLLIYTLSYR 
ASGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCMQRIEFP 



>012 

DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPS 
RFSGSGSGTDFTLT I SSLQPEDFATYYCQQS YSTP 



>014 

DIQLTQSPSSLSASVGDRVTITCRVSQGISSYLNWYRQKPGKVPKLLIYSASNIiQSGVPS 
RFSGSGSGTDFTLT I SSLQPEDVATYYGQRTYNAPP 



>018 

DIQMTQSPSSLSASVGDRVTITCQASQDISNYLNWYQQKPGKAPKLLIYDASNLETGVPS 
RFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLP 



>02 

DIQMTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIYAASSLQSGVPS 
RFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSTP 



>04 

DIQLTQSPSSLSASVGDRVTITCRVSQGISSYLNWYRQKPGKVPKLLIYSASNLQSGVPS 
RFSGSGSGTDFTLT I SSLQPEDVATYYGQRTYNAPP 



>08 

DIQMTQSPSSLSASVGDRVTITCQASQDISNYLNWYQQKPGKAPKLLIYDASNLETGVPS 
RFSGSGSGTDFTFTISSLQPEDIATYYCQQYDNLP 
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The following sequences are taken from a study that has sequenced the entire human lambda gene locus 
r KawasakiK. et ah 1997). 

Total number of sequences: 36 

>V1-11 

QSVLTQP PS VS EAPRQRVT I SCSGSS SNI GNNAVNWYQQLPGKAPKLLIYYDDLLPSGVS 
DRFSGSKSGTSASLAI SGLQSEDEADYYCAAWDDSLNGP 



>V1-13 

QSVLTQPPSVSGAPGQRVTISCTGSSSNIGAGYDVHWYQQLPGTAPKLLIYGNSNRPSGV 
PDRFSGSKSGTSASLAITGLQAEDEADYYCQSYDSSLSGS 



>V1-16 ^ 

QSVLTQPPSASGTPGQRVTISCSGSSSNIGSNTVNWYQQLPGTAPKLLIYSNNQRPSGVP 
DRFSGSKSGTSASLAI SGLQSEDEADYYCAAWDDSLNGP 



>V1-17 

QSVLTQP PSASGTPGQRVTISCSGSSSNIGSNYVYWYQQLPGTAPKLLIYSNNQRPSGVP 
DRFSGSKSGTSASLAI SGLRSEDEADYYCAAWDDSLSGP 



>V1-18 

QSVLTQPPSVSGAPGQRVTISCTGSSSNIGAGYWHWYQQLPGTAPKLLIYGNSNRPSGV 
PDQFSGSKSGTSASLAITGLQSEDEADYYCKAWDNSLNA 



>V1-19 

QSVLTQPPSVSAAPGQKVT I SCSGSS SNIGNNYVSWYQQLPGTAPKLLIYDNNKRPSGIP 
DRFSGSKSGTSATLGITGLQTGDEADYYCGTWDSSLSAG 



>Vl-2 

QSALTQPPS ASGSPGQSVTI SCTGTS SDVGGYNYVSWYQQHPGKAPKLM I YEVSKRPSGV 
PDRFSGSKSGNTASLTVSGLQAEDEADYYCSSYAGSNNF 



>Vl-20 

QAGLTQPPSVSKGLRQTATLTCTGNSNIVGNQGAAWLQQHQGHPPKLLSYRNNNRPSGIS 
ERFSASRSGNTASLTITGLQPEDEADYYCSALDSSLSA 



>Vl-22 

NFMLTQPHSVSESPGKTVTISCTRSSGSIASNYVQWYQQRPGSSPTTVIYEDNQRPSGVP 
DRFSGSIDSSSNSASLTISGLKTEDEADYYCQSYDSSN 



>Vl-3 

QS ALTQPRS VSGS PGQS VT I S CTGTS SDVGGYNYVSWYQQHPGKAPKLM I YDVS KRPSGV 
PDRFSGSKSGNTASLTISGLQAEDEADYYCCSYAGSYTF 



>Vl-4 

QSALTQPASVSGSPGQSITISCTGTSSDVGGYNYVSWYQQHPGKAPKLMIYEVSNRPSGV 
SNRFSGSKSGNTASLTISGLQAEDEADYYCSSYTSSSTL 
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>Vl-5 

QSALTQPPSVSGSPGQSVTISCTGTSSDVGSYNRVSWYQQPPGTAPKLMIYEVSNRPSGV 
PDRFSGSKSGNTASLTISGLQAEDEADYYCSLYTSSSTF 



>Vl-7 

QSALTQPASVSGSPGQSITISCTGTSSDVGSYNLVSWYQQHPGKAPKLMIYEGSKRPSGV 
SNRFSGSKSGNTASLTISGLQAEDEADYYCCSYAGSSTF 



>Vl-9 

QSALTQPPFVSGAPGQSVTISCTGTSSDVGDYDHVFWYQKRLSTTSRLLIYNVNTRPSGI 
SDLFSGSKSGNMASLTISGLKSEVEANYHCSLYSSSYTF 



>V2-1 

SYELTQPPSVSVSPGQTASITCSGDKLGDKYACWYQQKPGQSPVLVIYQDSKRPSGIPER 
FSGSNSGNTATLTI SGTQAMDEADYYCQAWDSSTA 



>V2-11 

SYELTQPPSVSVSLGQMARITCSGEALPKKYAYWYQQKPGQFPVLVIYKDSERPSGIPER 
FSGSSSGTIVTLTISGVQAEDEADYYCLSADSSGTYP 



>V2-13 

SSELTQDPAVSVALGQTVRITCQGDSLRSYYASWYQQKPGQAPVLVIYGKNNRPSGIPDR 
FSGSSSGNTASLTITGAQAEDEADYYCNSRDSSGNHL 



>V2-14 

SYVLTQPPSVSVAPGQTARITCGGNNIGSKSVHWYQQKPGQAPVLWYDDSDRPSGIPER 
FSGSNSGNTATLTI SRVEAGDEADYYCQVWDS S SDHP 



>V2-15 

SYELTQLPSVSVSPGQTARITCSGDVLGENYADWYQQKPGQAPELVIYEDSERYPGIPER 
FSGSTSGNTTTLTISRVLTEDEADYYCLSGDEDNP 



>V2-17 

SYELTQPPSVSVSPGQTARITCSGDALPKQYAYV7YQQKPGQAPVLVIYKDSERPSGIPER 
FSGSSSGTTVTLTI SGVQAEDEADYYCQSADSSGTYP 



>V2-19 

SYELTQPSSVSVSPGQTARITCSGDVLAKKYARWFQQKPGQAPVLVIYKDSERPSGIPER 
FSGSSSGTTVTLT I SGAQVEDEAD YYCY SAADNNL 



>V2-6 

SYELTQPLSVSVALGQTARITCGGNNIGSKNVHWYQQKPGQAPVLVIYRDSNRPSGIPER 
F SGSNSGNTATLT I S RAQAGDEADYYCQVWDS S TA 



>V2-7 

SYELTQPPSVSVSPGQTARITCSGDALPKKYAYWYQQKSGQAPVLVIYEDSKRPSGIPER 
FSGSSSGTMATLT I SGAQVEDEAD YYCYSTDSSGNH 
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>V2-8 

SYELTQPHSVSVATAQMARITCGGNNIGSKAVHWYQQKPGQDPVLVIYSDSNRPSGIPER 
FSGSNPGNTATLTISRIEAGDEADYYCQVWDSSSDHP 



>V3-2 

QTWTQEPSLTVSPGGTVTLTCASSTGAVTSGYYPNWFQQKPGQAPRALIYSTSNKHSWT 
PARFSGSLLGGKAALTLSGVQPEDEAEYYCLLYYGGAQ 



>V3-3 

QAVVTQEPSLTVSPGGTVTLTCGSSTGAVTSGHYPYWFQQKPGQAPRTLIYDTSNKHSWT 
PARFSGSLLGGKAALTLLGAQPEDEAEYYCLLSYSGAR 



>V3-4 

QTVVTQEPSFSVSPGGTVTLTCGLSSGSVSTSYYPSWYQQTPGQAPRTLIYSTNTRSSGV 
PDRFSGS I LGNKAALT ITGAQADDESDYYCVLYMGSGI S 



>V4-1 

QPVLTQPPSSSASPGESARLTCTLPSDINVGSYNIYWYQQKPGSPPRYLLYYYSDSDKGQ 
GSGVPSRFSGSKDASANTGILLISGLQSEDEADYYCMIWPSNAS 



>V4-2 

QAVLTQPSSLSASPGASASLTCTLRSGINVGTYRIYWYQQKPGSPPQYLLRYKSDSDKQQ 
GSGVPSRFSGSKDASANAGILLISGLQSEDEADYYCMIWHSSAS 



>V4-3 

QPVLTQPTSLSASPGASARLTCTLRSGINLGSYRIFVfYQQKPESPPRYLLSYYSDSSKHQ 
GSGVPSRFSGSKDASSNAGILVISGLQSEDEADYYCMIWHSSAS 



>V4-4 

QPVLTQPSSHSASSGASVRLTCMLSSGFSVGDFWIRWYQQKPGNPPRYLLYYHSDSNKGQ 
GSGVPSRFSGSNDASANAGILRISGLQPEDEADYYCGTWHSNSKT 



>V4-6 

RPVLTQPPSLSASPGATARLPCTLSSDLSVGGKNMFWYQQKPGSSPRLFLYHYSDSDKQL 
GPGVPSRVSGSKETSSNTAFLLISGLQPEDEADYYCQVYESSAN 



>V5-1 

LPVLTQPPSASALLGASIKLTCTLSSEHSTYTIEWYQQRPGRSPQYIMKVKSDGSHSKGD 
GIPDRFMGSSSGADRYLTFSNLQSDDEAEYHCGESHTIDGQVG 



>V5-2 

QPVLTQPPSASASLGASVTLTCTLSSGYSNYKVDWYQQRPGKGPRFVMRVGTGGIVGSKG 
DG I PDRFS VLGSGLNRYLT I KN I QE EDE SDYHCGADHGSGSNFV 



>V5-4 

QPVLTQSSSASASLGSSVKLTCTLSSGHSSYIIAWHQQQPGKAPRYLMKLEGSGSYNKGS 
GVPDRFSGSSSGADRYLTISNLQFEDEADYYCETWDSNT 



i 
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>V5-6 

QLVLTQSPSASASLGASVKLTCTLSSGHSSYAIAWHQQQPEKGPRYLMKLNSDGSHSKGD 
GIPDRFSGSSSGAERYLTISSLQSEDEADYYCQTWGTG 
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ABSTRACT 

The Kabat Database was initially started in 1970 to 
determine the combining site of antibodies based on 
the available amino acid sequences at that time. 
Bence Jones proteins, mostly from human, were 
aligned, using the now-known Kabat numbering 
system, and a quantitative measure, variability, was 
calculated for every position. Three peaks, at positions 
24-34, 50-56 and 89-97, were identified and proposed 
to form the complementarity determining regions 
(CDR) of light chains. Subsequently, antibody heavy 
chain amino acid sequences were also aligned using 
a different numbering system, since the locations of 
their CDRs (31-35B, 50-65 and 95-102) are different 
from those of the light chains. CDRL1 starts right 
after the first invariant Cys 23 of light chains, while 
CDRH1 is eight amino acid residues away from the 
first invariant Cys 22 of heavy chains. During the past 
30 years, the Kabat database has grown to include 
nucleotide sequences, sequences of T cell receptors 
for antigens (TCR), major histocompatibility complex 
(MHC) class I and II molecules and other proteins of 
immunological interest. It has been used extensively 
by immunologists to derive useful structural and 
functional information from the primary sequences 
of these proteins. An overall view of the Kabat Database 
and its various applications are summarized here. 
The Kabat Database is freely available at http^/immuno. 
bme.nwu.edu 

INTRODUCTION 

The purpose of maintaining the Kabat Database of aligned 
sequences of proteins of immunological interest, in our 
opinion, is to provide useful correlations between structure and 
function for this special group of proteins from their nucleotide 
and amino acid sequences to their tertiary structures (1). These 
sequences are thus aligned with the ultimate aim of under- 
standing how these proteins are folded and how they can 
perform their biological functions. We include only coding 
region sequences that have been published. In some cases, only the 
amino acid sequences were published, while the corresponding 
nucleotide sequences were deposited in GenBank. All stored 



sequences were then printed out and checked visually against 
available published sequences. We routinely survey for 
possible new sequences in journals in our libraries, Medline 
entries, cross-references from other papers, and author notification; 
however, we may still miss some sequences. GenBank, on the 
other hand, contains a substantial number of unpublished 
sequences. If there are doubts about these sequences or their 
annotations, please refer to the original papers. The Kabat 
numbering systems (see the Introduction of 2) for antibody 
light and heavy chains, for TCR alpha and beta chains, etc., go 
hand-in-hand with variability calculations. The locations of the 
CDRs are the theoretically derived positions which can be 
verified experimentally. Indeed, from tie first antigen-antibody 
Fab complex (3) to the complexes of TCR, processed peptide 
and MHC class I molecule (4,5), it has been realized that alignment 
of amino acid sequences and variability calculations can be of 
utmost importance in understanding how these important 
macromolecules function biologically. Due to the rapid devel- 
opment of genetic and protein engineering methods, mouse 
and rat antibodies have been humanized to treat human 
cancers, viral infections, etc (6). CDRs of selected rodent anti- 
bodies are cut out and glued onto human antibody frameworks 
to minimize rejection by human patients. 

Our predicted CDRs are slightly different from Chothia's. A 
careful comparison can be found from a hyperlink on our 
website to 'Andrew's Antibody Page* (http://www.biochem.ucl. 
ac.uk/-martin/abs/index.html ). 

Massive amounts of sequence data are being continuously 
published in the scientific literature. It is imperative to collect 
and properly align the sequences so that they can be used by as 
many researchers in this field as possible. We have previously 
published five editions of these sequences (see the Introduction 
of 2). In 1991, the fifth edition (2) consisted of three volumes. 
Currently, the database is more than five times as large. As of 
September 29, 1999, the Kabat database contained 1 599 375 
and 2 517 756 nt for antibody light and heavy chain variable 
regions, respectively, as compared to 272 244 and 418 962 nt 
in 1991. Total numbers of entries, amino acids and bases of 
other categories of sequences can be obtained by using the 
* Current Counts' hyperlink on our website. The collection is 
available on our website (http://www.immuno.bme.nwu.edu ) 
which is free due to the generous support by various research 
grants fromNIH since 1970. 

Finally, numerous scientific papers have cited our database, 
quoting our fourth edition (7), fifth edition (2), or one of our 
more recent papers (8). On our part, we have been analyzing 
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the Kabat Database during the past few years with reference to 
the total numbers of antibody and TCR V-genes, possible 
evolutionary selection processes, importance of antibody 
CDRH3s as related to their fine specificities, etc. 

KABAT DATABASE 

The Kabat Database may be accessed for searching, sequence 
retrieval and analysis by a few different methods: electronic 
mail, WWW and ftp. The electronic mail interface has been 
available since 1993, the WWW interface since 1995 and 
various formats of the database in electronic format for nearly 
a decade (8). Our data formats, searching tools, output formats 
and database structures have gradually, been adopted by other 
immunological databases and interfaces. 

Electronic mail interface 

An electronic mail interface (seqhurit2@irruTiuno.bme.nwu.edu ) 
provides a non-interactive method for searching and sequence 
retrieval (9). Sending mail to the server address with the single 
word 'help' (no quotes) in the message body returns instructions 
for using the server. 

All sequences classes are searchable and returnable. The 
query format allows making AND/OR/NOT constructed 
restrictions on the database and amino acid and nucleotide 
sequence pattern matching with allowable differences. 
Requests are processed as they are received and depending on 
the network traffic, take -1-2 min to be searched and returned 
to the sender. The returned format is a fixed-line length record 
of 80 or fewer characters per line for ease in visual inspection 
and processing by user-written scripts or programs. The characters 
are plain text. 

The query format for the sent request consists of two parts. 
The first part contains directives for the server to follow while 
the second part contains specifications of the search. Specification 
of the extent of data returned, the number of documents to 
return, starting document and whether plain ASCII text or 
PostScript should be used in the return format may be entered. 
Further, one can direct the server to return a distribution, the 
variability or unaligned raw data for the search specified. 

The second part of the query contains the search restrictions 
on the database. Words separated by AND and OR may be 
used, as well as searching functions, like nucleotide/amino 
acid pattern matching and positional restriction matching. 

There are basically three steps in translating and performing 
a search on the Kabat Database: generate the question or query, 
translate it into a format the server can recognize and decide on 
the output options desired of the returned matches. For 
example, if matches of mouse kappa light chains of anti- 
phosphorylcholine antibodies are desired, the query and 
restriction on the database would be: 
Begin 

@mouse and kappa and phosphorylcholine 
The '@' before mouse tells the server that matches of the 
species mouse are desired, rather than searching through the 
entire database record for instances of the word 'mouse'. More 
complicated restrictions can be generated using parentheses for 
grouping and the minus sign for NOT. Finding all rat and 
rabbit sequences which are not kappa light chains, and 
returning them as amino acid sequences in PostScript format 
would be constructed as: 



PSAA 
Begin 

(rat and rabbit) and -kappa 

Partem matching is interpreted as the second part of an AND 
statement, such that finding all rat and rabbit sequences which 
are not kappa and contain the nucleotide pattern cagtacgtcag 
with three allowable mismatches, would be sent as: 
Begin 

(rat and rabbit) and -kappa [ implicit AND ] 

#NM3 

cagtacgtcag 

More examples of searching and output options may be found 
in the 'help* file returned from the server. 

WWW interface 

The WWW interface (8) to the Kabat Database: http://immuno. 
bme.nwu.edu contains searching and analysis tools as well as 
links to database download sites and other interesting databases. 
Most of the features found in the electronic mail interface are 
found in the WWW interface, as well as other tools. The 
WWW interface is more interactive than the Email and returns 
results faster, depending on the network traffic. 

Searching and analysis tools 

SeqhuntlL This grouping of programs allows searches through 
the annotations and sequence pattern matching of the amino 
acid and nucleotide sequence data with allowable mismatches. 
Like the Email server, restrictions on the database may be 
formulated as AND/OR/NOT constructs. Output extent, output 
format, maximum documents and starting document may be 
specified. Browsing of the output results in HTML format 
allows the user to view the database entries in an easy- to-read 
format. ASCII text may be selected as-dutput for use in user- 
generated scripts and programs. PostScript generation allows 
for printing on a PostScript supporting printer. Sequence 
matching is returned aligned with the target sequence with 
nucleotide or amino acid differences from the database 
sequence displayed in a case change. Since the database 
contains only coding regions of genes and proteins, the query 
sequence should be a portion of the coding region being sought 

Variability. Variability and amino acid distributions of 
sequence groups may be generated for restrictions on the data- 
base. The variability plots are in PostScript format and may 
either be viewed on the screen with an appropriate PostScript 
viewer (e.g. GNU ghostscript or ghostview) or printed to a 
postscript-supporting printer. Plots for human and mouse TCR 
gamma and delta chain variable regions are shown in Figure 1 . 
Scaling of the variability plots may be done allowing comparison 
of variability plots for different groupings of sequences. 
Distributions of the amino acids per position may be returned 
also, including the calculated variability for each position. 

Sequence alignment. Alignment of user-entered coding regions 
of immunoglobulin light chains according to the Kabat 
numbering system can be performed. Because of the relatively 
few alignment options available for light chains, most 
sequences can be aligned. One can start with around 10 amino 
acid residues or 30 nt. There is no lower limit on the length of 
sequence to be matched. In some cases though, visual inspection 
and alignment is necessary, as is for heavy chain alignment, 
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Figure 1. Variability plots for human and mouse TCR gamma and delta chain variable regions, using 377 human gamma, 1260 human delta, 297 mouse gamma 
and 461 mouse delta partial and complete sequences. 



especially within the CDRH3 region, if additional codons or 
residues are inserted and denoted by *#'. If a suitable alignment 
counterpart from the database is not found for the target 
sequence, the user can contact us. 

FTP. Various formats of the database are available for down- 
load from NCBI's repository under the directory *kabat\ 
Currently active formats include a FASTA-like raw sequence 
format and the database's fixed length format of 80 or fewer 



characters per line and vertical alignment. Four main variations 
on the fixed length format exist to properly visually display 
single translations, pseudogene translations, J-mini genes and 
D-minigenes. Other immunological databases have adopted 
similar formats as exemplified by the three letter code amino 
acid translation followed by single letter code. A 'dump* 
version of the database is periodically updated which contains 
unlimited line length records more suitable for mass 
processing on unix-based systems. 
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OTHER APPLICATIONS 

As mentioned before, the Kabat Database was initially 
constructed for the purpose of identifying the antibody 
combining site (1). Starting from aligned amino acid sequences 
and using variability calculations, we have identified CDRs of 
antibody light and heavy chains, as well as those of TCRs. 
Such calculations can also provide useful predictions for MHC 
class I and II sequences (8), and to other aligned proteins 
sequences, e.g. HIV gpl20, gp41, etc. 

The importance of CDRH3 to confer fine specificity to anti- 
bodies was realized a few years ago (10). Furthermore, the 
unique CDRH3 nucleotide sequences have recently been used 
as a sensitive diagnostic test to detect residue B cell malignancies 
in cancer patients. Thus, many of these sequences have been 
determined. But most of them have been excluded from 
GenBank due to their relative short lengths. We have been 
meticulously collecting them, and realized the importance of 
their length distributions in antibodies of various specificities 
(1 1) and possible differences between CDRH3s of human and 
mouse (12). In the case of rabbit, the CDRH3s have less length 
variation than human and mouse. This may be compensated by 
the length variations of the CDRL3s (13). 



The length variations of TCR alpha and beta chain CDR3s 
are very restricted (14). On the other hand, TCR gamma and 
delta chain CDR3s have more length variation, close to those 
of antibody heavy chains (Fig. 2). Whether they bind antigens 
directly is unclear. 

During recent years, various research groups have decided to 
sequence the entire coding region of different antibody and 
TCR V-genes, in order to have an idea of their total numbers. On 
the other hand, we have discovered that pair-wise comparisons of 
V-gene nucleotide sequences in the Kabat Database provide 
very accurate estimations of their total numbers (15,16). In 
addition, such comparisons seem to suggest that antibody and 
TCR V-genes have evolved under different selective pressures 
(17). In the case of MHC class I sequences, comparison of their 
aligned sequences has elucidated a new mechanism of generating 
novel MHC class I molecules by random assortment of their al 
and a2 gene segments (18). 

DISCUSSION 

The Kabat Database has been around for 30 years. It has 
provided the immunology community a useful service, since it 
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not only is a sequence database but also incorporates vital 
aspects of the biology of the immune system. Various analytical 
methods have been developed to study the structure and function 
relations of proteins of immunological interest. 
Electronic addresses: 
http://immuno.bme.nwu.edu 
seqhunt2@immuno.bme.nwu.edu 
Citing the Kabat Database: 

Authors using this database may cite this paper together with 
the electronic addresses. 
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The antibody molecule plays the central role in humoral 
immunity by attaching to pathogens and then recruiting 
effector systems to destroy the invaders. In doing so, it 
embodies two antagonistic tendencies — diversity and 
commonality — since it must possess both a variable sur- 
face to recognize different foreign epitopes, and a con- 
stant surface that its own effector, systems can recognize. 
Efforts to explain this duality continue to yield unex- 
pected insights into immunology, biophysics, and molec- 
ular and evolutionary genetics — placing immunoglobu- 
lins among the most fruitful research subjects ever 
studied. In immunologic thought certainly, antibody re- 
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search has been critical. For example, the paradoxical 
appearance of a highly variable N terminal coupled to a 
constant C regions in immunoglobulin chains led to the 
first articulation of the concept of "two genes, one poly- 
peptide chain" ( I) and eventually to class switching (see 
Chapters 10 and 22). Questions on the origin of 
binding-site diversity led to the discovery of VDJ recom- 
bination, a mechanism also operative in the T ceil recep- 
tor locus (see Chapter 10). General biology also benefits 
from the immunoglobulin. For example, for all their 
complexity, antibodies are composed of segments de- 
rived from a single structural building block — the 
immunoglobulin domain. This domain is found in an 
ever-growing superfamily of proteins involved in immu- 
nologic, developmental, and neurologic cell recognition 
(2). In a different direction, several workers have gener- 
ated antibody-binding sites that catalyze chemical reac- 
tions (3) — providing a new means of examining struc- 
ture-function relationships in catalysis. 
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TABLE 2. Numerical bounds of statistical and structural 



features of variable domains 



Statistical feature 


Position 


Corresponding 
structural loop 


FR1 


Heavy 


1-30 






Light 


1-23 


H1: 26-32 


CDR1 


Heavy 


31-35 (35a,b) 




Light 


24-34 (27a-f) 


L1: 26-33 


FR2 


Heavy 


36-49 






Light 


35-49 




CDR2 


Heavy 


50-65 (52a-c) 


H2: 53-55 




Light 


50-56 


L2: 50-52 


FR3 


Heavy 


66-94 (82a-c) 






Light 


57-88 




CDR3 


Heavy 


95-102 (100a-k) 


H3: 96-101 




Light 


89-97 (95a-f) 


L3: 91-96 


FR4 


Heavy 


103-113 






Light 


98-107 (106a) 





CDR (complementarity-determining regions), FR (frame- 
works), H1-3 (heavy chain variable loops), and L1-3 (light 
chain variable loops) are numbered according to Kabat et at. 
(23). The numbering system serves to optimally align con- 
served residues. Length variations due to insertions/dele- 
tions during evolution or junctional diversification are labeled 
with letters. For example, heavy chain position 52 may repre- 
sent up to three residues depending on the germline V gene 
employed. Data for the table were taken from Kabat et ai. (23) 
and Chothia and Lesk (27). 

cepts involved. The stereoviews depict the main-chain 
traces of two Fabs with the CDR a-carbons marked by 
dots and the heavy chain denoted by heavier lines. The 
V L -V H interface appears edge-on with strands G(3-3) of 
the light chain and C of the heavy chain closest to the 
viewer (reference to Fig. 4 may help orientation). The 
dots representing CDRs clearly cluster together at the 
top of the Fab to generate the binding site (Fig. 8B even 
shows the peptide ligand bound in the site). The crucial 
point to note is the very different CDR structure in the 
context of similar FR and Ch-C l module backbones. 
The three-dimensional morphologies of V regions will 
be covered in more detail following further examination 
of the sequences that lead to binding site diversity. 

V-Region Sequences 

Variable region sequences do not randomly differ rela- 
tive to each other — even within the CDRs. Analyses of 
hundreds of V regions reveal that the sequences natu- 
rally fall into a homology-based hierarchy directly re- 
lated to the germline antibody gene loci. Members 
within a hierarchical group are more similar to each 
other than to all sequences from other groups; further- 
more, similar sequences display a shared pattern of 
amino acid substitutions that serve as "membership 
badges" for the various classifications. Through examina- 
tion of these linked-substitutions within and between 
species, an evolutionary history of V regions can be ob- 



tained. Of course, the oldest and most basic group is that 
of the V regions themselves, followed by the division into 
V H> V,, and V x representing separate V-gene loci on dif- 
ferent chromosomes. V H; V 4 , and V A , in turn, split into 
V-region subgroups, or "families" (reviewed in ref. 25). 
The current requirements for membership in a family . 
are based on nucleotide cross hybridization correspond- 
ing to 80% homology at the DNA level. At the protein 
level this translates to about 75% identity within families 
and less than 70% (usually 30-60%) between families. By 
this criterion, V H splits into six families as do V ( and V x . 
Table 3 shows the percent identity between representa- 
tive members of the six human V H families. Note how 
the comparisons of family members (boxed numbers) 
yield identity values over 80%. The split into six families 
antedates the primate-rodent divergence since mice pos- 
sess analogous subgroups. Figure 9 aligns six sequences 
— two each from V H , V <f and V x — to demonstrate 
linked substitutions. All six sequences display a V- 
specific motif in FR4 (W/F-G-X-G) (19); this provides a 
jff-bulge in strand 3-3 crucial for correct dimerization. In 
turn, other patterns distinguish heavy from light chains- 
Note the V H -specific sequence (G-L-E-W-hydrophobic) 
in the middle of FR2 as compared to the V L -specific 
motif (P-hydrophilic-hydrophobic-L-hydrophobic) in 
the analogous location; these residues induce 0-bulges in 
strand C also for V dimerization (26). More extensive 
patterns differentiate the families from each other, as 
shown in Fig. 10 for some k and H family members. The 
importance of V-region classification reaches beyond ac- 
ademic or evolutionary interest. The splitting of families 
can be continued down to subfamilies and finally individ- 
ual genes. Thus family extent and membership directly 
reflect the germline antibody repertoire of an individual. 
Since the V-region loci are known to be polymorphic, 
presence or absence of a given gene may influence the 
predisposition to autoimmunity (25) (see Chapter 30) 
and/or the prenatal "education" of the immune system 
through idiotypic networks (see Chapter 24). More gener- 
ally, the various gene groups provide the universe of 
starting structures available to the humoral immune sys- 
tem except for CDR3. Correlation of these sequences 
with their corresponding structures provides not only in- 
sights to the limits of binding site variability, but also a 
means for semiempirical prediction of V-region struc- 
ture based on the sequence (27,28). 

Full sequence variability results from participation of 
the germline genes in the somatic diversification system 
described in Chapter 10. In this system, about 100 func- 
tional V genes present at each of the V H , V„ and V A loci 
encode unique (FR 1 -CDR 1 -FR2-CDR2-FR3-5' CDR3) 
segments while four to six "joining (J)" minigenes at 
each locus encode a unique (3' CDR3-FR4) section. The 
heavy chain employs an additional minigene. termed 
"diversity (D)," to encode the central portion of CDR3; 
about 30 unique D genes exist between the V H and J H 
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Two Acquired Immunodeficiency Syndrome-Associated Burkitt's 
Lymphomas Produce Specific Anti-i IgM Cold Agglutinins 
Using Somatically Mutated V H 4-21 Segments 

By Piersandro Riboldi, Gianluca Gaidano, Edward W. Schettino, Thomas G. Steger, Daniel M. Knowles, 

Riccardo Dalla-Favera, and Paolo Casali 



We analyzed the reactivity and the structure of the V H and 
V L segments of two IgM monoclonal antibodies (MoAbs) 
produced by spontaneously in vitro outgrowing cell lines, 
HBL-2 and HBL-3, established from two acquired immunode- 
ficiency syndrome (AIDS) patients with Epstein-Barr virus 
(EBV)-negative Burkitt's lymphoma (BL). These B-cell clones 
were representative of the respective neoplastic parental 
clones, as determined by immunophenotypic and molecular 
genetic analysis. The IgM MoAbs were highly specific for 
the i determinant on red blood cells (cold agglutinins), but 
bound none of the other eight self and nine foreign antigens 
(Ags) tested, including those most commonly recognized 
by natural antibodies or autoantibodies. Structural analysis 
showed that the IgM MoAb V H segment sequences were 
93.5% and 84.2% identical with that of the germline V„4-21 
gene, which encodes the vast majority of cold agglutinins 
that are specific for the i/l carbohydrate Ag and are produced 
under chronic lymphoproliferative conditions. The HBL-2 
MoAb Vh4-21 gene segment was juxtaposed with 20P3 and 
«M> genes and paired with a VA.1 segment the sequence of 
which was 95.5% identical to that of the germline Humlv117 

AUTOIMMUNE PHENOMENA can occur in associa- 
tion with several human B-cell disorders, such as cold 
agglutinin disease, 12 lymphoma, 3-6 and the B-cell expansion 
and hypergammaglobulinemia occurring in human immuno- 
deficiency virus (HTV)-infected patients. 7 " 9 Although the pre- 
cise role of different self antigens (Ags) in the B-cell clonal 
selection associated with the above pathologic conditions 
remains to be defined, circumstantial evidence for a role of 
self Ags in clonal expansion and selection in autoimmune 
humans and mice has been provided. 10 " 16 The crucial role of 
Ags in inducing clonal expansion and selection in the normal 
B-cell repertoire is well documented. 17 " 2 ' Recently, it has 
been suggested that Ag stimulation also plays a role in the 
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gene; the HBL-3 MoAb V H 4-21 gene segment was juxtaposed 
with DXP'1 and Jh5 genes and paired with a VX1 segment, 
the sequence of which was 86.7% identical to that of the 
germline HumlvlLI gene. The high degree of conservation 
of the V H 4-21 gene in the human population, the nature of 
the nucleotide differences in the expressed V«4-21 seg- 
ments, and the presence of nucleotide substitutions in the 
HBL-2 and HBL-3 IgM MoAb Jh and/or JX segments sug- 
gested that the MoAb V segments underwent a process of 
somatic hypermutation. This was formally shown in the 
HBL-3 MoAb V H segment, by differentially targeted polymer- 
ase chain reaction amplification of the HBL-3 MoAb-produc- 
ing cell genomic DMA. In addition, cloning and sequencing 
of the genomic DNA from fibroblasts of the same patient 
whose neoplastic B cells gave rise to the HBL-3 cell line 
yielded a germline copy of the Vn4-21 gene. Thus, the expres- 
sion of Vh4-21 gene products may be involved in a serf Ag- 
drtven process of clonal B-cell expansion and selection asso- 
ciated with BL in these AIDS patients. 
© 1994 by The American Society of Hematology. 

B-cell expansion and selection preceding and/or associated 
with development of lymphomas of various histologic 
types***" 4 

The assessment of a potential role for Ags in the clonal 
B-cell expansion and selection associated with lymphoma 
or leukemia entails, first of all, the definition of the specificity 
and of V H and V L segment structure of the tumor-derived 
antibody. The analysis of antibody specificity and V H and 
V L segment structure depends on the availability of a homo- 
geneous in vitro growing and Ig-producing tumor cell popu- 
lation representative of the in vivo neoplastic clone. This is 
a critical requirement in view of the findings of oligoclonal 
or polyclonal, not necessarily neoplastic, B-cell populations 
accompanying the predominant neoplastic clone, as found 
particularly in bioptic specimens of Burkitt's lymphoma 
(BL) emerging in patients with acquired immunodeficiency 
syndrome (AIDS). 25,26 Possibly due to the improved treat- 
ment and longer survival rate, these patients display a 60- 
fold increased incidence, relative to that expected for the 
general population, of lymphomas, mainly of the Burkitt's 
type. 27 

We analyzed the Ag reactivity and the structure of the V H 
and V L segments of the IgM monoclonal antibodies (MoAbs) 
produced by two spontaneously in vitro outgrowing cell lines 
established from 2 AIDS patients with Epstein-Barr virus 
(EBV)-negative BL. 28 The absolute identity between the in 
vitro growing monoclonal cell lines and their respective in 
vivo neoplastic clones was established by immunopheno- 
typic and molecular genetic analysis. The IgM MoAbs from 
both cell lines strongly bound to the i Ag on red blood cells 
(RBCs; cold agglutinins), but to none of the other self and 
foreign Ags tested. The structural correlate for such Ag- 
binding specificity was provided by segments encoded by 
Vh4-21 and VM genes in somatically mutated configuration. 
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Thus, a process of selection by the i self Ags may have 
played a role in the B-ceil clonal expansion preceding and/ 
or associated with the development of BL in these AIDS 
patients. 

MATERIALS AND METHODS 

Generation and characterization of the monoclonal AIDS BL ceil 
lines. The HBL-2 and HBL-3 MoAb-producing cell lines were 
established using B lymphocytes spontaneously outgrowing from 
two tumors histologically and immunophenorypically classified as 
small noncleaved cell lymphoma (SNCCL).** Both SNCCL arose in 
patients with AIDS. Immunopbeaotypic analysis was performed by 
fluorescence flow cytometry of isolated cells using a FACScan (Bee- 
ton Dickinson Corp, Mountain View, CA) and a panel of labeled 
murine MoAbs, including those to CD 19, HLA-DR, CD10, CD21, 
and CD5. 28 The clonality of the cell lines and their absolute relat- 
edness to the tumors were determined by Ig gene rearrangement 
analysis using a J H probe on HindUl, EcoVl, and BamHl DNA 
digests. 29 The c-myc translocations were detected by cytogenetic 
analysis. 28 The status of the c-myc locus was analyzed by hybridiza- 
tion of EcoRl- and /ftndlll-digested DNA to the human c-myc probe 
MC413RC, representative of the third exon of the c-myc gene. 30 The 
presence of the EB V genome was investigated using a probe specific 
for the EBV genomic termini (5.2-kb BamHl-EcoRl fragment iso- 
lated from the fused BamHl terminal fragment NJ-het). 28 The pres- 
ence of HTV* sequences was investigated using the X7A/2 probe on 
HindJU and Sac I DNA digests, and that of HTLV-I sequences was 
investigated using an HTLV-env probe on BamHl and Pst I DNA 
digests. 28 

Analysis of the AIDS BL cell line-derived MoAb. The IgM 
MoAbs produced by the HBL-2 and HBL-3 cell lines were analyzed 
for their binding to polyclonal human IgG Fc fragment (Organon 
Teknika-Cappel, Malvern, PA); calf thymus DNA (Sigma Chemical 
Co, St Louis, MO); insulin (Sigma); human recombinant tumor ne- 
crosis factor-of (TOF-cr), TNF-0, and interleukin-l£ (IL-10; BASF 
Biotech Corp, Cambridge, MA); human thyroglobulin; HIV-1 and 
cytomegalovirus (CMV) and parvovirus B19 recombinant glycopro- 
teins; lipopolysaccharide (LPS) and £-galactosidase from Esche- 
richia coli (Sigma); phosphorylcholine chloride (Sigma); Pneumo- 
coccus polysaccharides, including types 1, 3, and 4; and tetanus 
toxoid (Massachusetts Public Health Biological Laboratories, Ja- 
maica Plain, MA), using specific enzyme-linked immunosorbent 
assay (EUSA) involving plates coated with 1 to 5 /jg/mL of these 
Ags. 31 * 34 The IgM MoAb specific i/I blood group cold agglutinin 
activity was tested by hemagglutination using papain-treated and 
untreated group 0 + umbilical cord or adult human erythrocytes in a 
100 /iL reaction volume." Cold agglutinating titers were expressed 
as the smallest amount (nanograms) of IgM MoAb agglutinating 
10 7 papain-treated human RBCs. Finally, the presence of the cross- 
reacting 9G4 idiotype on the IgM HBL-2 and HBL-3 MoAbs was 
analyzed using the appropriate IgG2a rat antibody.'- 2 * 36 

Cloning and sequencing of the MoAb V H and VX genes. Poly (A) + 
RNA was isolated from the HBL-2 and HBL-3 MoAb-producing B- 
cell lines using the Micro Fast-Track mRNA isolation kit (Invitrogen 
Corp, San Diego, CA) and reverse-transcribed using Moloney mu- 
rine leukemia virus (MMLV) reverse transcriptase (Superscript 
RNase H-Re verse Transcriptase; GIBCO BRL Life Technologies, 
Gaithersburg, MD). cDNA (100 ng) was submitted to polymerase 
chain reaction (PCR) in a volume of 50 /iL containing 200 ^mol/L 
of each dNTP, 2.5 U of Tag polymerase (Perkin Elmer Corp, Nor- 
walk, CT), and 10 pmol of each primer. The following sense degen- 
erated primers encompassing a portion of the leader region of differ- 
ent V H families plus an EcoRl site were used: ALT 1 (V H I family) 
[5' GGGAATTCATGGA<nW - 



TCIXGDC 3']; ALT2 (V H m family) [5' GCGAATTCATGGAG(Ci> 
TTGGGCTGA(CG)CTXKj(CG)TIT(CI^ 3']; ALT4 (V„IV family) 
[5' CHjGAATCATGAA(AG)CA(TC^ - 
(CT)CT(CG)C 3']; and HI-3. an oligonucleotide primer encom- 
passing a portion of the leader sequence of V H m family, [5' TTG- 
GGCTGTGCTGGGTTTTCCT 3']. The antisense primer consisted 
of the reverse complement [5' CC GAATTCA GCCGAGGGGGAA- 
AAGGGTTT 3'] of a 21 nucleotide 5' Cji sequence plus an EcoRl 
site. The oligonucleotide primers specific for the 5' portions of the 
VX chains were as follows: VAI[5' ATG(GA)CC(TG)GCT(CT)C- 
CCTCTCCTCCT 3 f J, and VXI1-VI (VXD, VXm, VXIV, VXVI 
chains) [5' ATG(AG)C(CDTGGACCC(CIl(AT)Crc(Cr)(TG)- 
(TG)TT 3']. The antisense CX chain primer consisted of the [5' 
TTGGCTTGAAGCTCCTCAGAGGA 3'] oligonucleotide. For V„ 
and VX gene amplification, 30 cycles of PCR were performed under 
denaturing, annealing, and extending conditions of 94°C (1 minute), 
52°C (1 minute), and 72°C (2 minutes), respectively. PCR products 
were sized and isolated on low melting agarose gel and ligated into 
pCR 1000 plasmid vector (Invitrogen Corp). The ligation mixture 
was used to transform INVaF' competent cells according to the 
manufacturer's protocol. Recombinant clones were selected ac- 
cording to the length of the insert and sequenced by the dideoxy 
chain termination method using the Tag Track Sequencing Kit (Pro- 
mega Corp, Madison, WI). Each V H or VX sequence was derived 
from the analysis of at least three independent clones. Differences 
in nucleotide sequences among different recombinant clones were 
observed in few cases (<0.001/base) and such variants were ex- 
cluded from the sequence analysis. DNA sequences were analyzed 
using the software package of the Genetics Computer Group of the 
University of Wisconsin, version 7.1, and a Model 6000-410 VAX 
computer (Digital Equipment Corp, Marlboro, MA). DNA sequence 
identity searches were performed using the GenBank database and 
the FASTA method. 37 

Analysis of the putative germline IgV H segment that gave rise to 
the expressed HBL-3 V H gene. Genomic DNA was extracted from 
the monoclonal HBL-3 B cells and autologous fibroblasts obtained 
from the same bioptic sample used for the generation of the tumoral 
HBL-3 cell line. B-cell or fibroblast genomic DNA (100 ng) was 
supplemented with the appropriate sense and antisense oligonucleo- 
tide primers (10 pmol each). PCR amplification was performed in a 
50 fit reaction volume using Tag polymerase under denaturing, 
annealing, and extending conditions of 94°C (1 minute), 60°C (I 
minute), and 72°C (1 minute), respectively, for 30 cycles. The oligo- 
nucleotides used were as follows: (1) the sense V„4-21 FR1 primer, 
encompassing a portion (residues 10 to 27) of the FR1 sequence of 
germline Vh4-21 gene 38 and differing in two nucleotides from the 
corresponding sequence of the expressed HBL-3 V H gene [5' CTA- 
CAGCAGTGGGGCGCA 3']; (2) the sense HBL-3 leader primer, 
encompassing a portion of the leader sequence of the HBL-3 V H 
gene and differing in one nucleotide (C instead of G at position 
-31) from the corresponding area of the germline V58 gene, 39 the 
member of the V H IV gene family displaying the highest degree of 
identity with the V„4-21 gene; and (3) the antisense Vn4-21 FR3 
primer, consisting of the reverse complement [5* GTGTCCGCG- 
GCGGTCACAGA 3'J of a FR3 sequence (residues 250 to 269) 
shared by the expressed HBL-3 V H gene and the germline Vh4-21 
gene. Part of the amplified DNA was fractionated through a 1.5% 
agarose gel, transferred to a nylon membrane (Hybond; Amersham 
Life Sciences, Arlington Heights, IL) and hybridized at 48°C with 
the HBL-3 complementarity ^termining region 1 (CDR1) oligonu- 
cleotide probe labeled with [y- 32 P] ATP (DuPont NEN Research 
Products, Boston, MA) by T4 polynucleotide kinase. The HBL- 
3 CDR1 oligonucleotide encompassed a FR1-CDR1 sequence [5' 
GTTCCGTGATTACCTCTGGACA 3'1 (residues 84 to 105) of the 
expressed HBL-3 V H gene, differing in seven bases from that of the 
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Table 1. Features of the AIDS BU 



Tumor MoAb Segment 



Cell Line 


Source 


Histology 


EBV DNA 


HIV DNA 


HTLV-I DNA 


c-myc 


Chains 
H L 


Ag Specificity 


V M Gene 


Nucleotide 
(amino acid) 
Identity* <%) 


HBL-2 


Pleural fluid 


SNCC 


No 


No 


No 


t(8;U> 


M 


X. 


i 


VH4-21 11 


93.5 (88.6) 


























HBL-3 


Liver mass 


SNCC 


No 


No 


No 


t(8;22> 


A* 


X 


i 


V„4-21 


84.2 (80.4> 




















0.65 







* Compared with the genomic germ line sequences. 

t The gene 20 P3 has been reported by Schroeder et al* 7 ; DXP'1 gene has been reported by Ichibara et al. 44 

* The complete sequences of the genomic germline V L genes have been reported as follows: Humlvl 17 (VK1 subgroup)," and HumlvlLI (VM 
subgroup). 51 

§ Smallest amount (in nanograms) of IgM MoAb agglutinating 10 7 pa pain-treated cord RBCs in a 100 fiL reaction volume. 
n The sequence of the genomic germline V H 4-21 gene has been reported by Sanz et al. 38 
U Expected numbers of R mutations calculated as reported in the Results. 



corresponding area of the germline Vh4-21 gene. After hybridization, 
the membrane was washed twice with 2x SSC/0.1% sodium dodecyl 
sulfate (SDS) at room temperature for 30 minutes and twice with 
IX SSC/0.1% SDS at 54°C for 30 minutes. Autoradiography was 
performed using Kodak XAR-5 film (Eastman Kodak Co, Rochester, 
NY). Other amplified DNA was inserted into the pCR II plasmid 
vector (Invitrogen Corp) for cloning and sequencing. 

RESULTS 

Characterization of the IgM MoAb-producing HBL-2 and 
HBL-3 cell lines. The features of the HBL-2 and HBL-3 
cell lines established from 2 AIDS patients with SNCCL 
and their identity with the respective primary tumoral tissues 
have been reported 28 and are summarized in Table 1. Both 
tumor samples and respective cell lines expressed surface Ig 
fx and k chains and the B -cell-restricted marker CD 19. This 
was consistent with the B-cell origin of the cell lines and 
with the histologic diagnosis of SNCCL. 40 In addition, the 
surface expression of CD 10 but not CD21 was consistent 
with the cell line phenotype of sporadic BL. 40 The mono- 
clonality of the AIDS BL cell lines and their absolute relat- 
edness to the respective tumors was formally established by 
the analysis of the Ig H chain gene rearrangements and that 
of the c-myc oncogene translocations. The Southern blotting 
analyses of EcoRl- and Hindm -digested DNA were consis- 
tent with two distinct patterns of c-myc activation. In the 
HBL-2 cells, the breakpoint was located 5' (3 to 5 kb) to 
the c-myc first exon; in the HBL-3 cells, the breakpoint was 
within a ~4-kb region 3' of the c-myc exon 3. Both HBL- 
2 and HBL-3 cells were negative for EBV, HTV, and HTLV- 
1 sequences and so were their respective original tumor cells. 

The Ag-binding activity of the IgM MoAbs produced by 
the HBL-2 and HBL-3 cell lines. The HBL-2 and HBL-3 
IgM MoAbs specifically bound to the i determinant on hu- 
man erythrocytes, as shown by their strong agglutination of 
cord (smallest agglutinating doses: 4 and 0.6 ng/10 7 RBCs, 
respectively), but not adult (no agglutination by HBL-2 
MoAb; HBL-3 MoAb smallest agglutinating dose, 38.8 ng/ 
10 7 RBCs) papain-treated human RBCs. The specificity of 
the HBL-2 and HBL-3 IgM MoAbs was further strengthened 
by the MoAbs* failure to bind to any of the eight self and 



nine foreign Ags tested, including IgG Fc fragment, human 
thyroglobulin, ssDNA, phosphorylcholine, insulin, human 
recombinant TNF-a, human recombinant TNF-/?, human re- 
combinant IL-ljff, /?-galactosidase and LPS from Ecoli, teta- 
nus toxoid, HIV- 1 , recombinant glycoproteins of CMV and 
Parvovirus B19, and Pneumococcus polysaccharides (Table 
1). Binding to each of these Ags by HBL-2 and HBL-3 
MoAbs yielded an absorbance of less than 0.05 at 492 nm; 
negative and positive controls were always less than 0.05 
and more than 1.00, respectively. The HBL-3 but not the 
HBL-2 MoAb expressed the cross-reacting idiotype defined 
by the anti-idiotypic 9G4 MoAb. 1 * 2 * 36 

The HBL-2 and HBL-3 MoAb V H - segments. Figure I 
depicts the nucleotide (A) and deduced amino acid (B) se- 
quences of the HBL-2 and HBL-3 IgM-MoAb V H genes and 
that of the closest reported germline V H gene. The differ- 
ences between sequences are summarized in Table 1. The 
HBL-2 and HBL-3 MoAb V H gene sequences were 93.5% 
and 84.2% identical, respectively, to that of Vh4-21 gene, a 
member of the V H IV gene family. 38 Accurate inspection of 
the Vh4-21 -related sequences available in the GenBank 
showed that the HBL-2 V H gene sequence displayed an iden- 
tical nucleotide difference at position 82, A instead of T, 
resulting in the same replacement of a Ser with a Thr, as 
found in the sequence of the Vh4-21 allele HumigvH4c. 41 
Twelve of the 19 nucleotide differences displayed by the 
HBL-2 V H gene sequence resulted in putative amino acid 
replacements, yielding replacement to silent (R:S) mutation 
ratios of 5.0 in the CDR and 1.1 in the framework regions 
(FR). Nineteen of the 46 nucleotide differences displayed 
by the HBL-3 V H sequence resulted in putative amino acid 
replacements, yielding R:S mutation ratios of 2.2 in the CDR 
and 0.3 in the FR. 

The HBL-2 and HBL-3 MoAb D and J„ genes. The nu- 
cleotide and deduced amino acid sequences of the HBL-2 
and HBL-3 IgM MoAb D and J H genes, and those of their 
closest germline D 42 * 46 and J H genes 42 are depicted in Fig 1. 
The HBL-2 MoAb D gene sequence displayed some identity 
with that of the expressed fetal 20P3 D gene 47 ; the HBL-3 
MoAb D gene displayed a stretch of similarity to that of the 
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and Related Cold Agglutinin MoAbs 



V H Segment 



V\1 Segment 



Nucleotide Differences in 



Nucleotide Differences in 



CDR 



FR 



D Genet 



J H Gene 



Nucleotide 
(amino acid) 
VX1 Gene* Identity* (%) 



CDR 



FR 



J\ Gene 



5 (3.211 



7 (10.1)1 



20P3 



11(7.8)1 5 8(26.2)1 22 DXP'1 

(Inverted) 



J H 6 Humlv117 95.5(92.8) 5 <2.7)f 4 2(6.9)1 2 JX2/JX3 
J H 5 Humlv1L1 86.7(80.6) 9 (8.0)1 8 10(20.8)1 12 JX1 



reverse complement of the germline DXP'l gene, suggesting 
a possible inverted D gene joining origin (Fig 1C). 48 Both 
expressed D genes were flanked by untemplated nucleotide 
additions. The entire length of the D segment ranged from 
13 nucleotides in HBL-2 to 36 nucleotides in HBL-3. The 
HBL-2 and HBL-3 MoAbs used truncated and mutated 
forms of J H 6 and J H 5 genes, respectively (Fig 1C). The de- 
duced amino acid sequences of the D-J H genes are depicted 
in Fig ID, as segregated in GDR3 and FR4 stretches, ac- 
cording to Kabat et al. 49 The CDR3 sequences were highly 
divergent in length and composition. The HBL-2 and HBL- 
3 FR4 sequences were invariable in length and displayed 
two and one amino acid replacements, respectively. 

The HBL-2 and HBLr3 MoAb VK and J\ genes. Figure 
2 depicts the nucleotide (A) and deduced amino acid (B) 
sequences of the HBL-2 and HBL-3 MoAb VX genes and 
those of the closest reported germline VX genes. The differ- 
ences between sequences are summarized in Table 1. HBL- 
2 and HBL-3 MoAbs used two members of the VX1 sub- 
group, the Humiglvll7 and HumiglvlLl genes, respec- 
tively. 14 - 50 When compared with the germline gene, the HBL- 
2 VX1 gene sequence displayed nine and four nucleotide 
differences in the CDRs and FRs, respectively, yielding four 
and two amino acid replacements, and R:S mutation ratios of 
1.2 and 1.0, respectively. When compared with the germline 
gene, the HBL-3 VX1 gene sequence displayed 39 nucleotide 
differences. These were scattered throughout the CDRs and 
FRs, yielding a total of 19 amino acid replacements and R:S 
mutation ratios of 1 . 1 and 0.8, respectively. Figure 2 depicts 
the nucleotide (C) and deduced amino acid (D) sequences 
of the MoAb JX segments and their respective germline JX 
templates. The HBL-2 MoAb used a JX2/JX3 segment in 
germline configuration; the HBL-3 MoAb used a JX1 seg- 
ment with five nucleotide mutations resulting in three amino 
acid replacements. 

Somatic mutations in the HBL-3 MoAb V H segment. Be- 
cause of conservation of the Vh4-21 gene in humans, the 
high number of nucleotide differences displayed by the 
HBL-3 V H gene sequence when compared with that of the 
Vh4-21 germline gene, and the detection of mutations in the 



HBL-3 MoAb Jh5 and JX1 segments, we hypothesized that 
the HBL-3 MoAb V H segment consisted of a somatically 
mutated form of the Vh4-21 gene. PCR amplifications were 
performed using ad hoc designed oligonucleotide primers 
and genomic DNA from the HBL-3 cell line or autologous 
fibroblasts. The sense Vh4-21 FR1 primer, encompassing an 
FR1 sequence (residues 10 to 27) shared by the Vh4-21 
segment and the expressed HBL-3 V H gene, was used in 
conjunction with the anti sense Vh4-21 FR3 primer, encom- 
passing an FR3 sequence (residues 250 to 269) shared by 
the germline Vh4-21 and the expressed HBL-3 genes. The 
two combined primers amplified DNA from both fibroblasts 
and HBL-3 cells. The molecular size of the amplified product 
(-260 bp) was consistent with that of the sequence spanning 
residues 10 to 269 of the Vh4-21 gene sequence (Fig 3 A, 
lanes 1 and 2). The same antisense Vh4-21 FR3 primer was 
also used to amplify fibroblast DNA, in conjunction with 
the sense HBL-3 leader primer, encompassing a stretch of 
the leader sequence of HBL-3 V H gene (residues —49 to 
-25) and differing in only one nucleotide from that of the 
corresponding area of the V58 gene, the V H IV family mem- 
ber displaying the highest degree of identity with the Vh4- 
21 gene. The molecular size of the amplified product (—400 
bp) was consistent with that of the sequence spanning resi- 
dues -49 to 269 of the HBL-3 V H gene, including the un- 
translated intervening intron (Fig 3A, lane 3). 

The three DNA amplification products were analyzed for 
their ability to hybridize with the |> 32 P]-Iabeled HBL-3 
CDR1 oligonucleotide. This encompassed a stretch of the 
HBL-3 V H gene FR1-CDR1 sequence that displayed seven 
putative mutations when compared with the corresponding 
germline Vh4-21 gene sequence. The [?- 32 P]-IabeIed HBL- 
3 CDR1 oligonucleotide strongly hybridized with the —260 
bp DNA amplified from the HBL-3 cell line (Fig 3B, lane 
1), but not with the -260 or -400 bp DNA amplified from 
the autologous fibroblasts (Fig 3B, lanes 2 and 3, respec- 
tively). To identify the autologous germline V H gene that 
putatively gave rise to the expressed HBL-3 V H gene, the 
product amplified from fibroblast DNA using the sense Vh4- 
21 FR1 and the antisense Vh4-21 FR3 primers was cloned 



2956 



RIB0LD1 ET AL 



S' . Leader 

HBL-3 (HBL-3 Leader] -48 CTCiraTTCTITOTC^ _! 

HBL-2 -28 G __g G---G 

FRl . ^O ftl , 

V H 4 - 2 1 CAGGTGCAGCTACAGCAGTGGGGCGCAGGACTGTTGAAGCCTTC^ 100 

HBL-3GL [VH4-21 FR1] 

HBL-3FRI/FR3 10 - AC- -A A-A A-----A -G---C---A CT- - 

HBL-3 G-C A — A AC--A- ---A-A A A C,-- 



HBL-2 



lHBL-3 CDRl] 

f^ 2 — . , . CPR2. 



V H 4 - 2 1 <3GAGCTGGATCCGCCAGGCCCCAGGGAAGGGGCTGGAGT^AT^^ 200 

HBL-3GL - 

HBL-3FR1/FR3 — CA A- -- T G-G--T---A AG-G T-CG 

HBL-3 -A T G-G— T A AG-G T-CG 

HBL-2 C - a- ---T -- 

. FR3 . . .3' 

V H 4 -2 1 CACCATATCAGTAGACACGTCCAAGAACCAGTTCTCCCTC 291 

HBL-3GL , 269 

HBL-3FR1/PR3 T-G T T — T-- T--A GTT- - -C T-- (V H 4-21 FR3J 

HBL-3 T-G---T T--T T--A GTT CT T A T C 

HBL-2 -G A AG T 



B 



FRl 



-CDR1 



FR2 



fTPR2. 



FR3 



V H 4 - 2 1 QVQLQQWAGLLKPSETLSLTCAVYGGSFSGYYWSWIRQPPG 97 

KBL-3GL 10 83 

HBL-3FR1/FR3 QT T RD-L-T T SDYN-XS--T S S-T 



HBL-3 
HBL-2 



-QT- 



-SDYN-KS- 



-E- 



20P3 
HBL-2 



DXP'I(IUV) 
HBL-3 



5* - ...... 3' 

ACGTGGGAGCTACT ATTACTACTACTACTACGGTATGGACCTCTGGGGGCAAGGGACXAC^ J H 6 
G-CCTT--G G CCA C G HBL-2 

GTTATAATAACTCCCCGAACCATAGTAATAC ACAACTGGTTCGACTCCTGGGGCCAAGGAACCCTGGT^ j h s 

--AGAGG-A- -TTC-GA ACAGTGGAAGACTT G G A-T HSL-3 



D 

EP*3 FR4 

JH6 WGQGTTVTVSS Jh6 

HBL-2 GLRYVDV -AK - - 

Jh5 WGQGTLVTVSS J H 5 
HBL-3 KEANSDNNSGRLDS N 



Rg 1. Top dusters: nucleotide (A) and deduced amino add (B) sequences of the V H genes used by the HBL-2 and HBL-3 fgM MoAb*. The 
top sequence is given for comparison and represents tha published gerrmme V„ gene (Vk*>21) djapUyinfl tha highest degree of Identity to tha 
expressed V H genes. Tha Vh4-21 gene belongs to tha VmIV family. Dashes indicate identity. 8olid Una* on tha top of each duster depict COR. 
Underlined nucleotides define the sequences or the reverse complement sequences of the primers adopted for PCft amplification. HBL-3 GL 
and HBL-3 FR1/FR3 are the sequences amplified from fibroblast and HBL-3 cell genomic DNA, respectively, using the combination of tha Vh*- 
21 FRl and the V„4-21 FR3 oligonucleotide primers (see Materials and Methods for details). Bottom dusters: nucleotide (C> and deduced amino 
acid (D) sequences of the D and Jh genes used by the HBL-2 and HBL-3 IgM MoAb*. Germfme D genes are ghren for comparison. Dashes 
indicate identity. DXP'l |lnv) is the reverse complement of the semifine DXP'1 sequence. The present sequences are available from EMBL/ 
GenBenk/DDBJ under accession numbers L29115 and L29116. 



and sequenced. The sequences of the eight independent 
clones were all identical to each other and to that of the 
Vh4-21 germline gene throughout the overlapping area (resi- 
dues 28 through 249) (Fig 1 A and B; HBL-3GL sequence). 
DNA amplified from the HBL-3 cell line using the sense 
V„4-21 FRl and the antisense V H 4-21 FR3 primers was also 
cloned and sequenced. The sequences of three independent 
clones were identical to each other and to that of the ex- 
pressed HBL-3 V H gene throughout the overlapping area 



(residues 28 through 249) (Fig 1A and B; HBL-3 FR1/FR3 
sequence). These experiments proved that the expressed 
HBL-3 V H segment was somatically point-mutated, and sug- 
gested that Vh4-21 is the germline gene that gave rise to it. 

Ag-seUction of the IgV genes expressed by the HBL-2 and 
HBL-3 BL cells. In absence of negative or positive selec- 
tive pressure on a gene product, R and S mutations are 
scattered randomly throughout the protein sequence. If a 
DNA segment displays a number of R mutations lower than 
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5 * Leader 

HUW1V117 -12 GGGTCCTGGGCC -1 

HBL-2 -34 CACCCTCCTCACTCACTGTGCA 

.FR1 _. .CDR1 . . , * 

Humlvl 1 7 CAOTCIX^TTGACGCAGCCGCCCTCAGTGTCTGCOGCCCCAG^ 100 

HBL-2 A— T- - 

.FR2 CD*2- — 

Humlvl 17 TATCCTGCTACCAGCAGCTCCCAGGAACAGCCCCC 200 

HBL-2 -- A G T C G-A - 

. FR3 , .CDR3 . 3' 

Humlvl 17 GTCTGGCACGTCAGCCACCCTX3G<X:ATC^ 294 
HBL-2 C T A-TT- 



5' . . Leader . 

Humlvl LI -57 ATGGCCAGCTTCCCTCTCCTCCTCACCC^^ -1 

HBL-3 -34 G 7 

-FR1 - .CPR1 u 

HumlvlLl CAGTCTGTGCTCACTCAGCCACCCTCAOCGTC^^ 100 

HBL-3 T G T C-T C T G-T C AT-C C-C---A--G- 

. FR2 CDR2 . 

HumlvlLl TAAACTGGTACCAGCAGCTCCCAGGAACGGCCCCCAAACTCCTCAT^ 2 00 

HBL-3 -TG T CT-T GT G C-TC A C G 

. FR3 .CDR3 . 3 ' 

HumlvlLl GTCTTCGCACCTCAGCCTCCCTGGCCATCAGTGG 294 
HBL-3 A A GA C TT- 



B 



Humlvl 17 
HBL-2 



HumlvlLl 
HBL-3 



. FR1 



CPR1 



-FR2 



FR3 



QS\rt,TQPPSVSAAPGO.KVTISCSGSSSNIGNNYVSWYM^ 98 

N R D--E- A NF 

. FR1 CDR1 . -FR2 - CDR2 . FR3 CDR2 
QSVLTQPPSASGTPGQJIVTISCSGSSSNIGSNTVNWYQ£LPGTAP^ 98 
A _-V-~P---S--V NT TK--D- - -HF- -V--R TS R T F 



5' . . . 3' CDR3 . FR4 . 

JA2/JA3 GTGGTATTCGGCGGAGGGACCAAGCIX3ACCGTCCTAGGT WFGGGTKLIVLG JX2/JX3 

HBL-2 HBL-2 

JX1 TATGTCTTCGGAACTGGGACCAAGGTCACCGTCCTAGGT YVFGTGTKVTVLG J^l 

HBL-3 G CA T T S--T-I HBL-3 



Hg 2. Top dusters: nucleotide I A) and deduced amino acid (B) sequences of the VX genes used by the HBL-2 and HBL-3 IgM MoAbs. In 
each duster, the top sequence is given for comparison and ispie se i iU the published germfine VX gene displaying the highest degree of 
identity with tho expressed VX genes. Dashes indicate identity. Solid tines on the top of each duster depict CDR. The Humlvl 17 and Humlvl L1 
genes belong to the VX1 subgroup. Bottom dusters; nucleotide (C) and deduced amino add (D) sequences of the JX segments used by the 
HBL-2 and HBL-3 MoAbs. Dashes Indicate identity. The present sequences are available from EMBL/GenBank/DDBJ under accession numbers 
L29113 and L29114. 



that expected by chance only, it is likely that pressure to 
maintain the germline-encoded protein structure was exerted. 
Conversely, if a DNA segment displays a number of R muta- 
tions higher than that expected by chance only, it is likely 
that a positive pressure to select R mutations was exerted. 
Hie numbers of expected R mutations in the HBL-2 and 
HBL-3 MoAb V H and V\ segment CDRs and FRs were 
calculated using the formula nXRfx CDR/*(or FR/X where 
n is the total number of observed mutations, R/ is the ex- 
pected proportion of R mutations (0.75), 51 and CDR/" is the 
relative size of the CDRs (or FRs) (0.23 and 0.77 for V„4- 



21 CDRs and FRs, respectively; 0.29 and 0.71 for V\l CDRs 
and FRs, respectively). Both HBL-2 and HBL-3 MoAb V 
segments displayed higher and lower numbers of R muta- 
tions in the CDRs and FRs, respectively, than those theoreti- 
cally expected (Table 1 ). Because of the primary role played 
by the Vh4.21 segment in the binding to the i/I Ag, we 
calculated the probabilities that the excess and the scarcity 
of R mutations arose by chance in the HBL-2 and HBL-3 
MoAb V H segment CDRs and FRs, respectively, using the 
binomial distribution model P - [n \/k\(n - k)\] <f (1 - 
#) n ~\ where q is the probability an R mutation will locate 
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Fig 3. Evidence for somatic mutations in the HBL- 
3 V H gene. (A) Ethidium bromide staining of ampli- 
fied DMA after fractionation in agarose gel elec- 
trophoresis. Using the V„4-21 FBI and V„4-21 FR3 
oligonucleotide primers, amplification products of 
identical and appropriate size were obtained by 
priming genomic ONA from the HBL-3 cells (lane 1) 
and autologous fibroblasts that had been estab- 
lished in primary culture from the Bt bioptic sample 
of the patient whose neoplastic B cells gave rise to 
the HBL-3 cell line (lane 2). Using the HBL-3 leader 
and Vh4-21 FR3 oligonucleotide primers* a product of 
appropriate size was amplified by priming genomic 
DNA from fibroblasts liana 3). (B) Southern blot hy- 
bridization of the PCR products shown in (A) with 
**P-labeted oligonucleotide HBL-3 CDR1 probe en- 
compassing an FR1-CDR1 sequence of the HBL-3 
MoAb V H gene. A strong positive hybridization signal 
was detected only with DNA amplified from the HBL- 
3 MoAb-producing B-cell line liana II, but not with 
DNA amplified from autologous fibroblasts {lanes 2 
and 3). 



to the CDRs (q = 0.23 X 0.75) or FRs (g = 0.77 X 0.75), 
and k is the number of observed R mutations in the CDRs 
or FRs. 52 The likelihood that the excess R mutations arose 
by chance in the V H segment GDR were P - A 1 in HBL-2 
MoAb and P = .06 in HBL-3 MoAb. the probability that 
the scarcity of R mutations in the V H segment FR resulted 
from chance were P -■ ;03 in HBL-2 MoAb and P = 
.00000002 in HBL-3 MoAb. In their original report on the 
application of the binomial distribution model to the analysis 
of the R point-mutations in Ag-selected IgV segment CDRs, 
Shlomchik et al SI suggested that the observed number of FR 
R mutations should be doubled to account for the fact that 
some of these mutational events will never be observed be- 
cause they are deleterious to the Ig structure. Although this 
correction was inferred from some experimental observa- 
tions, it is approximate and may not be applicable as such 
to all lg V H genes. When the analysis of the V H segment 
somatic mutation pattern was performed with the adjustment 
of doubling the number of observed FR R mutations, the 
probabilities that excess R mutations arose by chance in the 
V„ segment CDR were P = .49 and P = 35 in HBL-2 
MoAb and HBL-3 MoAb, respectively. 

DISCUSSION 

In the present studies, we established from bioptic speci- 
mens of 2 AIDS patients with BL two MoAb-producing cell 
lines representative of the respective tumors, and analyzed 
the Ag-binding activity and the V segment structure of these 
MoAbs. We found that both IgM MoAbs were cold aggluti- 
nins highly specific for the i blood group determinant, and 
both MoAbs bore Ag-combinihg sites consisting of point- 
mutated Vh4-21 segments in conjunction with VXl seg- 
ments. 



The exquisite specificity of the HBL-2 and HBL-3 IgM 
MoAb cold agglutinins for the i repetitive N-acetyllactas- 
amine units was: strengthened by the MoAb failure to bind 
any of the other eight self Ags and nine foreign Ags tested. 
The putative use of the V^21 gene segment by the HBL- 
2 and HBL-3 MoAbs is consistent with the use of the same 
segment by the majority of the reported cold agglutinins 
from patients with idiopathic cold agglutinin disease, FL, and 
Waldenstrom's macroglobulinemia. 1 - 53 ' 56 A primary role of 
the Vh4-2 1 segment in the binding to the i Ag is further 
supported by the divergence in composition and length of 
the H chain CDR3 sequences as well as by the heterogeneity 
of the V L segments of the present two IgM and the 10 re- 
ported Vh4-21 + cold agglutinins, which use a VJ segment 
three times, a V K U segment once, a VJIIa segment once, a 
VJIIb four times, and a YM segment once. 1 * 54 * 56 Thus, the 
V M 4-2I gene restriction in the cold agglutinin system may 
result from a selection process based on an inherent affinity 
of this V H gene product for the Ul carbohydrate struc- 
ture.'* 2 * 54 - 56 Nevertheless, the V„4-21 segment is not an abso- 
lute requirement for i/I-dependent RBG agglutination, be- 
cause cold agglutinins using Vh segments of the V H III family 
have been reported. 57 The restricted usage of the Y„4-21 by 
anti-i/I cold agglutinins is intriguing and may be related to 
the overrepresentation of the Vh4-21 -expressing clones in 
the normal B-cell repertoire, as determined by the V„4-21 - 
related 9G4 idiotype studies in the circulating blood of 
adults, as well as cord blood and fetal tissues; 4 In this regard, 
it is not known whether these circulating Vh4-21 -expressing 
B cells also have anti-i/I specificity. If this were the case, 
one could speculate that the abundant representation of Yh4- 
21 -expressing B cells in the periphery results from positive 
selection by i/I Ag, which are present not only on RBCs but 
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also on lymphocytes. 58 Expression of the i Ag on human 
erythrocytes is developmental^ regulated. It is maximal in 
fetal and neonatal life and decreases in adult life, in which 
the expression of the I Ag prevails. 59 

An Ag-driven process of clonal selection may play a role 
in the emergence and/or expansion of certain neoplastic B 
lymphocytes. Consistent with their putative germinal center 
origin, FL B cells, which represent the neoplastic equivalents 
of the elements recruited in a secondary Ag-specific re- 
sponse, display somatic mutations that resemble in nature 
and distribution those characteristic of an affinity maturation 
process. 22 " 24 A recent thorough documentation of the somatic 
mutation and clonal evolution of an FL expressing a Vh4- 
21 gene, the antigenic specificity of which had not been 
determined, showed at least three amino acid mutations, the 
anti-i/I "characteristic*' Gly to Asn mutation at position 31, 
a Val to He mutation at position 71, and a Ser to Thr mutation 
at position 83, which are shared by the HBL-2 and HBL-3 
MoAb Vh4-21 segments. 24 A role for clonal selection by 
self Ag during the evolution of anti-"Pr 2 " -specific B-cell 
lymphoma has been documented in detail. 3,6 The Pr 2 Ag is 
a sialoglycoprotein and provides, along with the multiple N- 
acetyllactosamine i/I Ag, the target for autoimmune phenom- 
ena that occur in association with several human clonal B- 
cell disorders. 

The sequences of the DNA amplified from the HBL-3 
MoAb-producing cell line and autologous fibroblast genomic 
DNA, using the HBL-3 leader, V„4-21 FR1, and Vh4-21 
FR3 primers, as well as the differential hybridization of the 
HBL-3 CDR1 oligonucleotide (encompassing an FR1-CDR1 
sequence of the HBL-3 MoAb V H segment) with the above 
amplification products, formally proved the mutated status 
of the HBL-3 V H segment, and suggested that Vh4-21 was 
the germline gene that gave rise to it The somatically mu- 
tated status of the HBL-3 and, possibly, HBL-2 MoAbs was 
further strengthened by the high degree of conservation of 
the Vh4-21 gene sequence in humans, and the extension of 
the point-mutations to the, in general, highly conserved, J H 
and/or J\ segments. An Ag-selection of the point-mutations 
in the HBL-2 and HBL-3 MoAb V H segments was suggested 
by the differential R:S mutation ratios in the CDR and FR 
(HBL-2 MoAb, 5.0 and 1.1, respectively; HBL-3 MoAb, 2.2 
and 0.3, respectively) and the accumulation in the CDR of 
the HBL-2 and/or HBL-3 V H segments of amino acid re- 
placements that are shared by other anti-i/I cold agglutinins 
and might have increased the affinity of the Vh4-21 segment 
for the i Ag, including the Gly to Asp mutation at position 
31, which is shared by the FS-1, FS-2, FS-4, and KAU V„ 
cold agglutinins, 54,55 the Ser to Thr mutation at position 35, 
which is shared by FS-1, FS-4, and FS-6 cold agglutinins, 
and the His to Tyr at position 53, which is shared by the 
FS-3 cold agglutinin. 55 However, a positive clonal selection 
of R mutations in the HBL-2 and HBL-3 MoAb V H segment 
CDRs was not further substantiated by the statistical analysis 
according to the binomial distribution model with the correc- 
tion for FR R mutations, as proposed by Shlomchik et al. 52 
This finding may be consistent with a putatively inherent 
anti-i/1 activity of the unmutated Vh4-2 I gene product, and, 
perhaps, a clonal selection against R mutations in the HBL- 



2 and HBL-3 MoAb V H segments CDRs, similar to that 
shown for other B-cell rumor anti-RBC autoantibodies. 6 The 
substitution of the Val with an He in the HBL-3 MoAb V H 
segment FR1 Ala-Val-Tyr (residues 23 to 25) triplet, which 
provides the structural correlates for the anti-idiotypic 9G4 
antibody binding, as recently shown by Potter et al, 60 is 
possibly responsible for the lack of 9G4 reactivity of the 
HBL-2 MoAb. 

In the present AIDS-associated BLs, it is unclear whether 
the initiation of the anti-i/I Ag autoantibody response consti- 
tuted a crucial event in the neoplastic transformation. The 
putative anti-self Ag clonal expansion and selection may 
have preceded the genetic accident, ie, c-myc proto-oncogene 
chromosomal translocation. 61 * 62 Alternatively, in these BLs, 
the specific B-cell expansion and selection may have fol- 
lowed the chromosomal translocation, resembling the series 
of events that have been paradigmatically illustrated in rela- 
tionship to bcl-2 proto-oncogene chromosomal t(14;18) 
translocation by Zelenetz et al 23 in a FL for which, however, 
a specific Ag could not be identified. Knowledge of the 
sequential order of activating, proliferating, and trans- 
forming events, including c-myc translocation and activation, 
Ag-dependent B-cell amplification, somatic hypermutation, 
and clonal selection, is crucial for a better understanding of 
the molecular pathogenesis of AIDS BL and, possibly, other 
BLs. These issues could be best addressed by the use of a 
tumor-specific Ig H chain CDR3 sequence oligonucleotide 
to identify tumor-related Ig V H -D-J H sequences in nonmalig- 
nant B-cell progenitors. 
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Abstract— Peripheral blood B lymphocytes have been isolated from healthy individuals who were 
hrimunized with lymphocytes from HLA-incompatible donors and transformed with Epstein-Ban 
vims to produce human monoclonal cell lines specific for human HLA molecules. The cell lines have 
been previously characterized and are known to bind to various class I and class 11 alloantigens. In 
this report we describe the molecular characterization of the heavy and light chain variable region 
gene segments that are utilized by these monoclonal antibodies. Using the polymerase chain reaction 
and primer pairs specific for the respective constant rep on and V„ or V L family, rearranged variable 
region gene segments were amplified from cDNA from individual cell lines. Products were then 
subcloned, sequenced and analysed for gene usage and apparent somatic mutation. The results show 
that the V H 3 gene family predominates in a group of six heavy chains (four out of six) with one V H l 
and one V H 4 gene segment- The light chain variable region gene family usage is more diverse with 2 
Vk3, 1 V k l, 2 Vj,2 and 1 V,3. The extent of apparent somatic mutation is minimal, relative to our 
previous observations in a group of high affinity human monoclonal antibodies specific for pathogenic 
organisms. 

Key words: HLA, atloantigen, ailoantibody, variable region, J region, D segment, human antibody 
repertoire, human monoclonal antibody, apparent somatic mutation. 



INTRODUCTION 

One of the unique and critical features of the humoral 
immune response is the innate ability to construct an 
immunoglobulin molecule with virtually any specificity. 
The formation of an antibody response begins with the 
gennline rearrangement of variable (V„) 5 , diversity (D) F 
and joining (J H ) gene segments for the heavy (H) chain 
and variable (V L ) and joining (JJ gene segments for the 
light (L) chain and ends with the antigen-driven selection 
of a B lymphocyte which possesses a unique binding 
specificity. In the interim the association of the heavy and 
light chains and the initiation of a number of mechanisms 
results in a repertoire of antibody-expressing B cells with 
an essentially infinite number of specificities (Tonegawa, 
1983). The gene segments which comprise a complete 
immunoglobulin molecule are found within three sep- 
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9048, U.S.A. 



arate gene complexes — heavy chain, k light chain and X 
light chain — all of which are located on distinct chro- 
mosomes and consist of multiple germline gene segments. 
These gene segments are grouped into gene families based 
on nucleotide sequence homology. There are seven V H 
(reviewed in Pascual and Capra, 1990; van Dijk et aL, 
1993); seven V w (Zachau, 1989); and at least 10 V A (And- 
erson et al. 9 1984; Williams and Winter, 1993; Stiernholm 
et a!. t 1994; Chuchana et a/., 1990) gene families. A con- 
siderable amount of knowledge exists regarding the 
rearrangement process and the subsequent differentiation 
of a B lymphocyte, but one predominant question 
remains unanswered: why do humans maintain 50-100 
distinct variable region germline gene segments for each 
chain? 

In order to address this question a number of human 
immunoglobulin repertoire studies involving the struc- 
tural analysis of the H and L chain variable region gene 
segments have been reported. These include myeloma 
proteins (Capra and Kehoe, 1975), fetal rearrangements 
(Schroeder et a/., 1987; Cuisinier et al. y 1989; Schroeder 
and Wang, 1990; Pascual et ai., 1993), cord blood 
105 
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rearrangements (Mortari et ttL 1992. 1993), hybridomas 
and Epsteih-Barr virus (EBV)-transformed B cells and 
combinatorial antibody libraries producing a wide var- 
iety of autoantibodies (reviewed in Dersimonian et aL. 
1990); alloantibodies (Hughes-Jones*/ aL. 1991;Thomp- 
son et aL. 1991; Larrick et aL 19896); and antibodies 
which bind to a number of different exogenous antigens, 
i.e. pathogenic microorganisms (Newkirk et aL- 1988; 
Larrick et at.. 1989a; Felgenhauer et aL. 1990; Andris et 
aL, 1991, 1992. 1993; Burton et al . 1991; Marasco et aL, 
1990; Barbas et aL, 1992, 1993; Zebedcc et a!., 1992; 
Ikcmatsu et aL. 1993: Barrett et aL. J 992; Schroeder et 
aL. 1992, Insel ei aL. 1992; Moran et aL. 1993; Scott et 
aL. 1992; Gillies er aL, 1989. Persson et aL. 1991; Ayala- 
Avila ei aL. 1993; Bogard et aL. 1993: Rioux et aL. 1994; 
Lewis etaL. 1993). Some circumstances suggest that there 
is some bias in gene expression, i.e. expression of different 
gene families does not reflect the size of the family, nor 
are functional members of any given family expressed 
equally (Pascual and Capra, 1992). Such gene segment 
bias has been well documented in the murine system, as 
demonstrated by fetal rearrangements (Y ancoupoulos er 
aL. 1984; Perbmitter er aL, 1985), and antibodies specific 
for antigens such as dextran and galactan (Schilling er 
aL, 1980; Rudikoff et aL. 1983). In human antibodies this 
bias has been seen predominantly in fetal rearrangements 
(Schroeder et aL. 1987; Cuisinier et aL, 1989: Schroeder 
and Wang. 1990, Pascual et aL. 1993), autoantibodies 
(reviewed in Dersimonian et aL. 1 990; Pascual and Capra. 
1992) and Haemophilus influenzae type b capsular poly- 
saccharide antibodies (reviewed in Insel et aL, 1 992; Scott 
er aL. 1992). Due to the prevalence of V region cross- 
reactive idiotypes (CRI) in several different autoimmune 
diseases and the study of V segment gene usage in human 
monoclonal antibodies, it has been hypothesized that 
multiple gene segments exist for the generation of distinct 
repertoires of B cells- However, studies involving the 
structural analysis of monoclonal antibodies to viral and 
bacterial antigens would suggest that many of the same 
variable region gene segments utilized in the afore- 
mentioned repertoires are also recruited in response to 
exogenous antigens. On the contrary, some human V H 
gene segments seem to be over-represented in the auto- 
immune repertoire. The V4-34 (V H 4-21) gene segment, 
for example, has only been found in autoantibodies, par- 
ticularly cold agglutinins (Pascual er aL. 1991. 19926), 
anu~DNA antibodies (van Es et aL, 1991), RF (Silb- 
erstein et aL. 1991; Pascual et aL. 1992a) and human red 
blood cell specific alloantibodies (Thompson er aL, 1991). 
In the case of cold agglutinins, this gene segment is 
responsible for the cross-reactive idiotype specificity 
characterisitic of the 1/i response and is the primary gene 
segment found to encode this antigen reactivity in pat- 
ients with cold agglutination disease (Pascual et aL, 1991, 
19926; Silberstein et aL. 1991: Grillot-Courvalin et aL. 
t992; Leoni et aL. 1991). One might argue that the restric- 
tion is a direct result of the homogeneous nature of the 
antigen, i.e. carbohydrate. However, in a group of struc- 
turally distinct blood group antigens, including the A 
antigen, the Rh C, c, D, E. e and G antigens, and the 



Kjdd antigens Jka and Jkb, this gene segment was found 
to represent 64% of the IgM antibodies and 21% of the 
IgG antibodies (Thompson et aL, 1991). This suggested 
a possible restriction in the human anti-red blood cell 
alloantibody response as well. 

In order to further address this issue, we have char- 
acterized the heavy and light chain variable region gene 
segments expressed by six human monoclonal alloan- 
tibodies which bind various human HLA class I and class 
II alloantigens (Pistillo er aL. 1986, 1987, 1988, 1989, 
1991. 1993: Mazzoleni et aL. 1989, 1991). These EBV- 
transformed cell lines were derived from healthy vol- 
unteers that were repeatedly immunized with blood from 
other individuals who were HLA-disparatc. We did not 
find the 34 gene segment expressed within this group 
or antibodies, instead, the results reflect what has been 
observed in the majority of circumstances, a pre- 
dominance of the V H 3 heavy chain gene family and, to a 
lesser extent, the V„3 and V ; 2 light chain gene families. 



MATERIALS AND METHODS 

Isolation and characterization of anti-class / and -clesx !l 
producing cell lines 

The isolation and characterization of the human class 
I- and class 11 -reactive monoclonal cell lines has been 
described previously (Pistillo et aL. 1987, 1988. 1989, 
1991. 1995: Mazzoleni etaL, 1989, 1991). Briefly, healthy 
volunteers were immunized by transfusing with aliquots 
of whole blood from HLA-disparate donors at weekly 
intervals. Mononuclear cells were then isolated from 
these volunteers and transformed with EBV. Culture 
supematants were screened for reactivity against donor- 
derived B lymphocytes. The specificity of each cell line is 
listed in Table I . Cell lines MP1 , MP9 and MP 12 produce 
monoclonal IgM/A antibodies, while cell lines MP6* 
MP 10 and MP1 4 produce monoclonal IgM /ft antibodies. 
Each cell line was derived from a different donor as fol- 
lows: MP1. Z. L . MP6, P. G., MP9, 0. O., MP10, F. G.. 
MP12, M. A., MPI4, V. F. The period of time between 
the last immunization and the sampling of the cells was 
as follows: MP1, I year; MP6. 4 years; MP9, 6 years; 
MP 10, 5 yean; MP 12, 1 year, and MP 14, 2 years. 

Oligonudeoudes 

The oligonucleotide primers used in the PCR ampli- 
fications were synthesized on an Applied Biosysterns 
DNA synthesizer (Foster City, CA). The oligonucleotide 
sequences are shown in Table 2. Degenerate positions are 
denoted by (X/X). Rearranged V H gene segments were 
amplified using the family-specific leader sequence pri- 
mers in conjunction with the HTJMQ* constant region 
primer. Rearranged V, gene segments were amplified with 
degenerate framework I primers (Songsivilai et aL, 1990) 
in conjunction with the HUMCA constant region primer, 
designed to amplify all of the human X constant region 
gene segments. Rearranged \\ gene segments were ampli- 



Cell line 



Variable region gene usage in human alloaniibodies 
Table I - Summary of V„ and V L gene usage in MP cell lines 



Specificity 
line 



V H gene 
family 



Closest 
germlrae 



Per cent 
homology 



J M gene 
segment 



V L gcne 
family 



Closest 
germline 
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Percent J u gene 
homology segment 



MP1 
MP6 
MP9 
MP10 



Mpn 

MP14 



DQBl'0201 

A*3002 

A*03 

DRB1*0I 

DRB1*09 

DRB1M001 

DRB1*15 

DRB1*16 

DRB1*11 

DRB1*08 

DRB1*12 



V H 3 


V3~li 


95.5% 


J»2 


V;3 


V H 3 


V3-23 


97-6% 




v,i 


V„l 


Vl-18 


97.6% 




Vi2 


V„4 


VH4.18 


96.6% 




V.3 


V H 3 


M72 


96.9% 


V 


Vj2 


V H 3 


56pl 


96.6% 


J«5 


V r 3 



VlIU 


95.7% 


J,3 


VK02/012 


99.7% 


L2 


DPL10 


97.2% 


J*3 


LI 6 


96.9% 


3.4 


hslv2046 


98.0% 




A27 


97.6% 





Table X Oligonucleotide sequences 



Name 



Sequence 5'-»-3' 



Specificity 



LVH 1 ATGGACTGGACCTGGAGGATC 

LVH3 CTCACCATGGAGTTTGGGCTG 

LVH4 ATGAAACACCTGTGGTTC 

HXJMC/i AAGGGTTGGGGCGGATGCACTCCC 

LVK1 ATGGACAa/C)GA(G/T)GG(T/C)C(C/rr)C(G/A)CTCAG 
ATGGACAC(A/QAGAG(T/C)CC(T/QCfG/A) 

LVK3 ATGGAA ACCCC AGCGC AG 
ATGGGGTCCCAGGTTCAC 

H UMCk G ACAGATGGTGCAGCCAC 

VLAM 1 CA(C/G)TCTCAGCTGACCG/T)CA(A/G)CC(T/C/A/G)(C/G)CCTC 

VLAM2 TC(C/G)TATCAGCTGAC(G/T)CA(A/G)CC(T/C/A/G)CCCTC 

VLAM3 AATTTTCAGCTGAC(G/T)CA(A/G)CC(T/C/A/G)CACTC1 

HUMCA TGTGGCCTTGTTGGCTTGAAG 



Primes VhI gene segments from the 5' 
end of the leader 

Primes V H 3 gene segments from the 5' 
end of the leader 

Primes V H 4 gene segments from the 5' 
end of (he leader 

Specific for the 5' region of human 

MU constant region 

A set of 2 degenerate oligonucleotides 

which prime V„l GNE segments from 

the 5' end of the leader 

A set of 2 oligonucleotides which 

prime V K 3 gene segments from the 5' 

end of the leader 

Specific for the 5' region of human k 
constant region 

Set of three degenerate oligonucleo- 
tides which prime V| gene segments 
from the beginning of framework 1 
Specific foe all human I constant 
region gene segments 



fied with the family-specific leader sequence primers in 
conjunction with the HUMCk constant region primer. 
Due to the large size of the VJ and V A 3 gene families, the 
leader sequences fall, roughly, into two groups, therefore, 
two primers were designed for amplification of members 
of these families. 

first-strand cDNA synthesis 

Total RNA was extracted from a frozen cell pellet by 
using RNA STAT-60 (Tel-Test "B" Inc., FriendswoocL 
TX). This is a single-step isolation based on the methods 
of Chomczynski and Sacchi (1987) and Kedrierski and 
Porter (1 99 1 ) which utilizes phenol and guanidinium thio- 
cyanate in a monophase solution. First-strand cDNA 
synthesis was performed using oligo-dT as the primer and 
high concentration AMV reverse transcriptase (Promega 



Corp., Madison, WI) according to a modified protocol 
of Gubler and Hoffman (Ausubel et 1989). 



Polymerase chain reaction 

Polymerase chain reactions (PCR) were performed 
essentially via the method recommended by Perkin- 
Elmcx Cetus. Two of 30 /d of cDNA (nanogram quan- 
tities) were added to a 200 mM solution of each dATP 9 
dCTP, dGTP and dTTP, with 600 ng of each primer 
and 2.25 U of Taq DNA polymerase (Promega Corp., 
Madison, WI). PCR cycles consisted of the following 
conditions: one cycle of denaturation at 94°C for 4min; 
annealing at 55°C for 2min; extension at 72°C for 2 min; 
39 cycles of denaturation at 94°C for 1 min; annealing at 
55°C For 2min; and extension at 72°C for 2 rain. Ampli- 
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fications were carried out in a Programmable Thermal 
Controller (MJ Research Inc.. Watertown, MA). 

Isolation, cloning and sequencing of the amplified products 

Amplified products were size selected on a I % agarose 
gel, Ugated into the EcoRV site of Bluescripl phagemid 
vector (Stratagenc, La Jolla, CA). transformed into 
CaCU competent XL- J Blue bacteria (Stratagene) and 
screened by the blue/white colony screening method. 
Miniprep DNA was prepared from white colonies utiliz- 
ing the Wizard Miniprep system (Promega Corp.. 
Madison, WT) and digested with Xba I and Xho I to 
determine the size of the ligated product. ssDNA was 
generated from positive clones following superinfection 
with M13K07 helper phage (Biorad, Hercules, CA) as 
described previously (Pascual et aL 1990). Sequencing 
was carried out on a minimum of four to six clones, from 
a single amplification, using the dideoxy chain ter- 
mination method (Sanger el al., 1977) with a modified 
version of T7 DNA polymerase (Sequenase. United 
States Biochemical, Cleveland. OH) (68) and the M13 
universal primer. Gel electrophoresis was performed 
using Long Ranger gel solution (JX Baker Inc., Phil- 
lipsburg, MJ). Plasmid DNA containing the heavy and 
light chain variable region genes from cell lines MP6, 
MP12 and MPI4 was also sequenced on an Applied 
Biosystems automated sequencer, model 373 A (Applied 
Biosystems, Foster City, CA) using both the MI3 uni- 
versal primer and the Ml 3 reverse primer. Completed 
sequences were analysed using DNASTAR (DNASTAR 
Inc., Madison, WT) and the combined EMBL/Genbank 
database. 

RESULTS 

Properties of anti-HLA cell lines m\d monoclonal tmri- 
bodies 

Six monoclonal B lymphoblastoid cell lines have been 
generated after EBV-immortalization of in vivo HLA- 
sensitized human B lymphocytes followed by repeated 
selection and subcloning of anti-HLA specific antibody* 
secreting cells. All of the cell lines that were obtained 
secrete alio-antibodies whose HLA specificity was ana- 
lysed by microlyraphocytotoxicity and cytofluorimetry 
against a panel of well-characterized HLA-homozygous 
B lymphoblastoid cell lines, HLA-typed peripheral blood 
B lymphocytes and HLA-transfected murine L cells. Seg- 
regation analysis within informative families has also 
been carried out. 

In contrast to murine monoclonal antibodies, all of 
these human monoclonal antibodies recognize poly- 
morphic HLA specificities, including class I and class 11 
specificities (Table 1). Therefore, they are useful tissue 
typing reagents, in particular MPK the fiTSt published 
monoclonal antibody identifying the celiac disease- 
associated DQB 1 '020 1 allele; MP1 0, which allows identi- 
fication of the rare DRB1*I001 phenotype; and MP6, 
which identifies the HLA-A*3002 (Pistillo et a/.. 1995). 
The availability of multiple HLA class II p chain first 



domain nucleotide and ammo acid sequences allowed 
us to correlate the reactivity pattern of the anti-class 11 
monoclonal antibodies with the presence of specific 
amino acid residues or clusters of residues that can be 
involved in the polymorphic epitopes recognized by the 
anti-class II monoclonal antibodies. This correlation 
demonstrated that amino acid residues unique to the 
DQB1 or DRB 1 chains can be most likely involved in the 
formation of Lhe epitopes recognized by the serologically 
monospecific antibodies MP1 and MPI2. For the anti- 
bodies recognizing more than one HLA specificity, such 
as MP10 and MP14, amino acid residues shared by 
different DRB1 chains can contribute to the formation 
of the antibody binding site. In the case of MP 12 the 
critical role played by glutamic acid at position 58 in the 
DRB1 chain has been clearly demonstrated by testing the 
ability to bind to wild-type or site-specific mutagenized 
transfectants rfClohe et al. 1992). 

The immunoglobulin isotype was determined by cyto- 
plasmic immunofluorescence staining of the clones with 
immunoglobulin chain-specific antibodies. AH of the 
clones produce lgM (up to I ^g/ml of culture super- 
natant), but utilize different light chains and express 
members of different V H and V L families (see below). 

Heavy chain variable regions 

The complete nucleotide sequence of each heavy chain 
variable region is shown in Fig. I. The V H gene segment 
expressed by MP1 is a member of the V H 3 gene family. 
It is 95.5% identical to the germVtne V H 3 gene segment, 
V3-I3. which is equivalent to DP48, V H 13-2 and 38pl 
(Table 1: Matsuda et aL 1993; Chothia et aL, 1992; 
Bemoan et aL 1988; Schroeder et aL 1987). The second 
cell line, MP6, also utilizes a V H 3 gene segment which 
demonstrates the closest sequence homology to the V3- 
23 (equivalent Lo DP47 and V„26) germline gene segment 
(97.6%. Table 1; Matsuda et aL 1993; Chothia et aL 
1 992). The V H gene segment utilized by MP9 is a member 
of the V H 1 gene family and displays 97.6% sequence 
homology to the germline gene segments, Vl-18 (DPI 4; 
Table 1 : Matsuda et aL 1993; Chothia et aL 1 992). MP10 
expresses a V H 4 gene segment that is most closely related 
to the germline gene segment, V H 4.I8 (Table 1; Sanz et 
aL 1989). The remaining two ceil lines express members 
of the V H 3 gene family, distinct from each other and from 
those utilized by cell lines MP 1 and MP6. MP1 2 expresses 
a V H 3 gene segment which is most closely related to 
another rearranged gene segment, M72, found in a fetal 
liver cDNA library (96.9%, Table 1; Schroeder and 
Wang, 1990). MP 14 is 96.6% identical lo another V H 3 
gene segment commonly found in the fetal repertoire, 
56pl (Table 1; Schroeder et aL 1987). 

The comparison between each expressed V H gene and 
the closest germline (or rearranged) gene segment is 
shown in Fig. 2, When each of these expressed sequences 
is compared, lo the most closely related germline or 
rearranged gene segment, the percent nucleotide differ- 
ence ranges from 2.4 to 4.5%. These nucleotide differ- 
ences are mainly distributed between complementarity 



MP1 VH 



SGGGLVQPGGSLRLSCA** 



M^tt ACT TTG ACT. « "F^ TST GCA AGA G« »6 TTC GAT ACT M» SCT CM TAC 



— JH2 



GOT CCC TTC GAT CTC TGG GGC CGG TGC ACC CTG GTC ACT GTC TCG TCA ' 
GPFDLWGRCTLVTVSS 



MP 6 

-LEADER 



ATg'gAG TTT GGG CTG AGC TGG CTT TTT CTI GTG GOT ATA TTA AAA GGT GTC CAG TGT GAG GTG CAS CTG TTG GAG 
MEFGLSWLFLVAILKGVQCEVQL1B 



-VH3- 



ACT GGG GGA GGC TTG GTG CAG CCS GGG GGG TCC CTG AGA CTC TCC TGT GCA GCC TCT GGA TTC ACC TTT AGC AGC 
TGGGtVQPGGSLRLSCAASGPTrSS 



TAT GCC ATG AGC TGG GTC CGC CAG GCT CCA GGG AAG GGG CTG GAG TGG GTC TCA GCT ATT AGT GGT AGT GGT GOT 
v*MSWVROAPGKGLEUVSAISGSGG 



CAG ACA TAC TAC GCA GAC TCC GCG AAG GGC CGG TTC ACC ATC TCC AGA GAC AAT TCC AAG AAC ACG TTG TAT CTG 
QTYTADSAKGRFTOSfcDBSKNTLYL 



CAA ATG AAC AGC CTG AGA GCC GAG GAC ACG GCC GTA TAT TAC TGT GCG AAA GAS AGG GGT TAC TAT GAT AGT CCG 
OMSSLRAEDTAVYYCAKERGYYDSP 



-> < JIB- 



TAT GCT TTA GAT ATC TGG GGC CAA GGG ACA ATG GTC ACC GTC TCT TCA 
YALDI«GQGTMVTVSS 

:) MM 

LEADER- 



ATG GAC TGG ACC TGG AGG ATC CTT TTC TTG GTG GCA GCA GCA ACA GGT GCC CAC TCC CAG GTT CAG CTG GTG CAG 
M DWTHRI L PLVAAATGAB S QV Q1VQ 



-VH1 — 



TCT GGA GCT GAG GTG AAG AAG CCT GGG GCC TCA GTG AAG GTC TCC TGC AAG GCT TCT GGT TAC ACC TTT ACC AGC 
SGAEVKKPGASVKVSCKA5GYTFTS 



*****CDR1** **•■*- ■ — — — — - 

TAT GGC ATC AGC TGG GTG CGA CAG GCC CCT GGA CAA GGG CTT GAG TGG ATG GGA TGG ATC AGC GCC TAC AAT GGA 
YGISWVRQAPGttGLEWMGWI SAYMG 

fc**#*#*+******+(*t)D2 *■******+*******■**■***— — ————————— " ———————— 

AAC ACA AAC TAT GCA CAG AAG CTC CAG GGC AGA GTC ACC ATG ACC ACA GAC ACA TCC ACG AGC ACA GCC TAC ATG 
NTNYAQKLQGRVTHTTDTSTSTAYK 



GAG CTG~AGC~ A gTcTG AAA TCT GAC GAC ACG GCC CTG TAT TAT TCT GGT ASA CAA TGG TTC GGG GAG TCG ATC TAC 
BLRSLKSDDTAVYYCGRQWFGESIY 

> < JH3 > 

TAC TAC TAC ATG GAC GTC TGG GGC AAA GGG ACC ACG GTC ACC GTC TCC TCA 
^YVMDVWGKGTTVTVSS 

Fig. 1. cDNA sequence or the anti-HLA heavy chain variable regions. The leader, V H . D and J B gene 
segments are denoted by the dashed line; CDRl and CDR2 arc indicated by asterisks, (a) cDNA 
sequence of MPl ; (b) cDNA sequence of MP6; (c) cDNA sequence of MP9; (d) cDNA sequence of 
MP10- (e) cDNA sequence of MP12; and (0 cDNA sequence of MP14. The sequences have been as- 
signed Genbank accession numbers L3843 1, U8433, L38435, L38425, L38427 and L3S429, respectively. 
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(d) mpio 



< LEADER " > < * 

ATG AAA CAC CTG TGG TTC TTC CTC CTG CTG GTG GCG GCT CCC AGA TGG GTC CTG TCC CAG CTG CAG CTG CAG GAG 
HK fl LWF FLLLVAAPRWVI,SQl»QLQE 

?CG~ GG<T CGA* CGA* CTC GTG^AAG CCT TCG GAG ACC CTG TCC CTC ACC TGC ACT GTC TCT GTG GGC TCC ATT AGT AGT 
SGPGLVKPSSTLSLTCTVSGGSISS 

_ _____+*#•**■+**-***#■**■*+** + 

+ ***#******•*#—. — , " 

AGT AGT CAC TAC TGG GGC TGG ATC CGC CAG CCC CCA GGG AAG GGA CTG GAG TGG ATT GGG ACT ATC TAT TAT AST 
SSHY»GWIROFPGKGLEWIGTIYyS 



+*************CDR2*****************"***'****'*** ' " 

GGG AGC ACC TAC CAC AAC CCG TCC CTC AAG AGC CGA GTC ACC ATA TCC GTA GAC ACG TCC AAG AAC CAG TTC TCC 
GSTVHNPSLKSRVTISVDTSXHQFS 



CTG AAG CTG AGC TCT GTG ACC GCC ACA GAC ACG GCT GTG TAT TTC TGT GCG AGA CAC CTC GGG CCC TGG GAA AAC 
LK LSSVTATDTAVrFCARHLGPWEN 

TGG GGC CAG GGA ACC CTG GTC ACC GTC TCC TCA 
WGQGTLVTVSS 

(e) MP12 

< - LEADER " > K 

ATG GAG TTT GGG CTG AGC TGG GTT GTC CTC GTT GCT CTT TTA AGA GGT GTC CAG TGT CAG GTG CAG CTG GTG GAG 
MEFGLSBVVLVALLRGVQCQVQLVE 



-VH3 



TCT GGG GGA GGC GTG GTC CAG CCT GGG AGG TCC CTG AGA CTC TCC TGT GCA GCC TCT GGA TTC TCC TTC AGC AGA 
SGGGVVQPGRSLRLSCAASGFSPSR 

*-+*CDIU******* 

TAT GCT ATG TAC TGG GTC CGC CAG GCT CCA GGC AAG GGG CTG GAG TGG GTG GCA GTT ATA TCA TAT GAT GGA AGT 
YAMYWVRQAPGKGLEWVAVISYDGS 

+++++ * + *■+ **CDR2* *■*■***■*+**■***■******** * 

AAT AAA TAT TAT GCA GAC TCC GTG AAG GGC CGA TTC ACC ATC TCC AGA GAC AAT TCC AAG AAC ACG CTG TAT CTG 
NKYYADSVXGRFTISRDNSKNTLYL 

CAA ATG GAC AGC CTG AGA GCT GAC GAC ACG GCT GTG TAT TAC TGT GCG GGA GGA GTG GTT ATT ATA TTT AGT CGA 
QMDSI.RADDTAVYYCAGGVVI I PSR 

CTT GAT TAC TGG GGC CAG GGA AAC CTG GCC ACC GTC TCC TCA 
LDYWGQGN LATV5S 



tf) ME14 

< LEADER " > < 

ATG GAG TTT GGG CTG AGC TGG GTT TTC CTC GTT GCT CTT TTA AGA GGT GTC CAG TGT CAG GTG CAA CTG GTG GAG 
MEFGLSWVFLVALLRGVQCQVQIVE 



-VK3 



TCT GGG GGA GGC GTG GTC CAG CCT GGG AGG TCC CTG AGA CTC TCC TGT GCA GCC TCT GGA TTC ACC TTC ACT AGC 
SGGGVVQPGRSLRtSCAASGFTFTS 

***♦•****•**»#**********+*** 

*"#**-*C£)R2#** *-** — 

TAT GCT ATG CAC TGG GTC CGC CAG GCT CCA GGC AAG GGA CTG GAG TGG GTG GCA GTT ATG TCA TTT GAT GGA AGC 
YAMHWVRQAPGKGLEBVAVMSFDGS 



AAA AAA TAC TAC GCA GAC TCC GTG AAG GGC CGA TTC ACC ATC TCC AGA GAC AAT TCC AAG AAC ACA CTG TTT CTG 
VKYYnDSVKGRFTlSRDNSKWTLFL 



CAA ATG AAC AGC CTG AGA GCT GAG GAC ACG GCT ATT TAT TAC TGT GCG AGA GAT CAA ATG GGT TGG TTC GAC CCC 
QMNSLRAEOTAIYYCARDQMGWFDP 

TGG GCC CAG GGA ACC CTG GTC ACC GTC TCC TCA 
WGQGTLVTVSS 



Fig. I- aituinuctt 



( a > V3-13 
MP1VH 



V3-13 
MP1VH 



V3-13 



GAGBTGCftGCTGGTGGftCTI 



CTGGGGGAGGCTT CGT ACAGCCJ GGGGGGTCCCTGAGACTCTCCTGTGCACCCT CT GGATTX1ACCTTCAGTAG 



*CDR2* 



T G G 



CCGT6AAGGGCCGATTCACCATI 



•OT(^SAGAAAATGCCAAEAACTCCTTGTATCTTC^ 



V3-13 
MP1VR 



(b) 



V3-23 
KP6VB 



V3-23 
MP6VH 



TATTACTGTGCAAGA 



*CDR2* 



****CDRl***** 
CTATGCCATGAGCTGGOTCCGCCAGGCTCC^^ 



V3-23 
MP6VH 



fiCTCCCTGIWjeGWCGOTOTCO^^ 



V3-23 
ME6VK 



(C) 

HP9VH 



GTATATTACTGT GCGRAA 

* 

CAGGTTCAGCTGGTGCAGTCTGGAGCTGAGGTGAAG^ 



Vl-18 
HP9VH 



*****CI>R1***** **#****♦♦*— *»**.**CDW******+***** 

GCTA!KOTATCAGCTGGG7GCGACAGG<XCCTGGACAAGa 

r C A 



VI- IB 
MP9VH 



ACAGAAGCTCCAGGGCAGAGTCACCATGAa^ 



Vl-18 
KP9VK 

(d) VH4-1B 
MF10VK 



GCCGTGTATTACTGTGCGAGA 
T GT. . . 

* 

CAGCTGCAGCTCCAGGAGTCGG^CCAGGACTGGTGAAGCCTT^ 

c - - T..T. 



VH4-1B 
KP10VH 



*+***CDRl***** ****** ***********cdr2******** 

GTAGXAGTTACTACTG€GGCrG<MCCG<:CftGCCCCCAG<;eAA 

C A C C 



VK4-18 
MP10VH 



******************+ 

CAACCXGTCCCTCAAGACTCGAGTCAC 



VH4-18 
MPXOVH 



ACGGCTGTGTATTACTGTGCGAGA 
, T 



(e) k* 2 

MP12VH 



M12 

MP12VH 



CAGGTGCAGCTG<n^GAGTCTGGGGGAGGCGTGGTOTGOT 

, T C 

+#**** CDKL ***** ********** **+****CDR2*** ********* 
AGCT AT GCTATGCACTGG GTCCG CXAGGGTCCAGGCAAGGGGCTGGAGTG GG7 GCCAGTTATATCATATGATGGAAGTAATAAATACTAC 
- - T..T 



M72 

KP1ZVH 



GCAGACT CCGTGAAGGGC CGATT CACCATCT CCAGAGACAATTCCAAGAACACGCTGT AT CTGCAAATGAACAGC CTGAGAGCTGAGGAC 

G C 



M72 ACGGCTGTGTA3TACTGTGCGAGA 

MP12VH G.. 

Fig. 2. Comparison of Ihe nucleotide sequences of anti-HLA heavy chain variable regions and the 
closest germline or rearranged gene segment. Identity between the sequences is denoted by the periods; 
CDR! and CDR2 are indicated by asterisks, {a) MP1 VH vs V3-1 3 (Matsuda el ai, 1993); (b) MP6VH 
vs V3-23 (Matsuda ct at, 1993); (c) MP9VH vs Vl-18 (Matsuda et aU 1993); (d) MP10VH vs VH4.1 8 
(Sanz et aL 1989); (e) MP12VH vs M72 (Schroeder and Wang, 1990); and (0 MPUVH vs 56pl 

(Schxoederei o/., 1987). 
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(f) 5S P 1 
MP14VH 



56pl 
MP14VH 



56pl 
MP14VH 



J. S. ANDR1S t«r ai 
CAGOTGCAGCTGGTGGAGTCTGGGGGAGGCCTCGTCCAGCCTG^ 



TAGCTtfGCTATG^ 



ACOTGACTCCCTGA^^ 



56pl 
MP14VH 



GACACGGCTGTGTATTACTGTGCGA 
A.T 



Fig- 2 - *un turned. 



determining region I (CDRt), CDR2 and framework 3 
(FR3) ? with occasional changes in FRI or FR2. Wirh the 
exception of MP9, in which the majority of nucleotide 
differences do not result in a change in amino acid (i.e. 
silent mutations), the remainder of the expressed V H gene 
segments exhibit a ratio of replacement to silent sub- 
stitutions ranging from 2: 1 to 7: 1 (Table 3), Furthermore, 
the majority of the replacement substitutions fall within 
CDRs and FR3, suggestive of an antigen-driven selection 
of these B cells. Overall, silent nucleotide changes arc 
rare, but fail mainly within framework regions, which 
may be more reflective of polymorphism than somatic 
mutation- These observations are not unique in the study 
of human immunoglobulin variable region gene 
segments, as shown by the numerous aforementioned 
studies. The main difference between the current obser- 
vations and our previous observations is the overall 
extent of apparent somatic mutation. In a group of 
human IgG antibodies that are specific for a number 
of different exogenous antigens, the percent nucleotide 
difference with a corresponding germline gene segment is 
generally more extensive than observed with these IgM 
antibodies. This observation may reflect the results of a 
recent study in which various tonsillar B cell subsets were 
assessed for the amount of somatic mutation that had 
been introduced into the expressed variable region gene 
segments (Pascual et aU 1 994). It was clearly shown that 
there are naive subsets of B cells, which express only 
surface igM and IgD. and express V H gene segments that 
are essentially germline in origin; while there are other 
IgM-expressing subsets (germinal center) that have begun 
to accrue point mutations. These appear to preceed the 
lgG-expressing* germinal center subset that have 



accumulated a substantial number of somatic mutations. 
On average, the number of base-pair substitutions among 
the IgG transcripts from germinal center-derived cells 
was two-fold compared to the IgM transcripts from the 
same B cell subpopulation. 

Light chain variable regions 

The complete light chain variable region sequence of 
each cell line is shown in Fig. 3. Three of the cell lines 
express light chains of the X isotypc, while four express k 
light chains (Table 1 ). The variable region gene expressed 
in the MPI cell line is a member of the V*3 gene family. 
It exhibits the closest homology to the germline gene, 
VITU. which is equivalent to DPL23 (95.7%, Table 1; 
Combriato and Klobeck, 1991; Williams and Winter, 
1993). The second cell line, MP6, utilizes a V fc gene seg- 
ment belonging to the family. This expressed 
sequence is 99.7% identical to the V v 02/012 gene 
segments, two identical germline gene segments which 
have been duplicated within the human k locus (Table 1;. 
Pargent et aL 1991). The third cell line, MP9, expresses 
a gene which is a member of the V 4 2 gene family. This 
expressed sequence displays 97.2% sequence identity to 
the germline gene, DPL10 (equivalent to hslv2066; Table 
I; Williams and Winter, 1993; Irigoyen ei al y 1994). 
MP10 appears to utilize the germline gene segment, LI 6 
(Humkv328), with a sequence identity of 96.9 (Table 
1; Huber et ai, 1993; Liu ei ai, 1989). Like MP9, MPI2 
also expresses a member of the V,2 family, distinct from 
that expressed in the MP9 cell line. MP12V X displays the 
closest sequence homology with the germline V-2 gene, 
hslv2046 (98%, Table 1; Irigoyen et al„ 1994). The last 



Table 3. Summary of nucleotide differences berween expressed and germline V„ gene se gments in anti-HLA antibodies 
FRI CDR) HU CDR2 FR3 

Cell line R/S % MUT R/S % MUT R:S VoMLT R/S % MUT R/S % MUT 



MPI 


2/0 


2.2% 


2/0 


13.3% 


0/1 


MP6 


1/1 


2.2% 


0/0 


0.0% 


0/0 


MP9 


0/0 


0.0% 


071 


6.7% 


0/0 


MPIO 


3/0 


3.3% 


1/0 


4.8% 


0/1 


MP12 


1/1 


2.2% 


2/0 


133% 


0/0 


MP 14 




2.2% 


0/0 


0.0% 


0/1 



2.4% 
00% 
00% 
2.4% 
0.0% 
2.4% 



2/0 
4/0 
0/2 

2/1 
0/2 
3/0 



4.2% 
7.8% 
3.9% 
6.3% 
3.9% 
5.9% 



6/1 
1/0 
3/1 
2/0 
3/0 
3/1 



7.3% 
1.0% 
4.2% 
2.1% 
3.1% 
4.2% 
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MPl 

^ _ _*********** 

^"^CAG^r^'cAG'c^^C^ GTgT TCC* GTG TCC CCA G^CAG ACA GCC AGC ATC ACC TGC TCT GGA GAT 

_ - ******* 

Z'Z^ T V 7 TTT 

ATg'aaTcgT^ ACT CTG ACC ATC AGC 

MKRPSGIPERFSGSNSGHTATixx* 

*CDR3** ***************** ** 



0X3- 



GGG~ ACc"cAG*GCT~ ATG GAT GAG GCT GAC TAT TAG TGT CAS GCG TGG GAC AGC AGC CTT GTG GTA TTC GGC GGA GGG 



ACC AAG CTG ACC GTC CTA 
T K X. T V X. 



KP6 



ATG GAC ACC AGA GtTcCC Aci'cAG CTc'cT^cTcTC CTG CTA CTC TGG CTC CGA GGT GCC AGA TGT GAC ATC CAG 



MDTRVPTOL 

-VK1 



ATg'aCC CAG TCT CCA TCC TCC CTG TCT GCA TCT GTA GGA GAC AGA GTC ACC ATC ACT TGC CGG GCA ACT CAG AGC 
MTQSPS3L3ASVGDRVTITCRASQS 

_***♦* **CDR2**** 

ATT^SC* AGC* TAT*TTA AAt'tGG TAT CAG CAG AAA CCA GGG AAA GCC CCT AAG CTC CTG ATC TAT GCT GCA TCC ACT 
- „ - « , MMvr*n«-OftKAPKLLITAAS3 



TTG CAA AST GGG GTC CCA TCA A6G TTC AGT GGC ACT GGC TCT GGG ACA GAT TTC ACT CTC ACC ATC AGC AGT CTG 
L qsGVP5RFSGSGSGTDFTLTISSL 



-> < JK2- 

**CDR3* *********** ******** 



CAA CCT GAA GAT TTT GCA ACT TAC TAC TGT CAA CAG AGT TAC AGT CCC CCT CCG GTA TAC ACT TTT GGC CAG GGG 
QPEDFATYYCQQSYSPPPVVTF GQG 



ACC AAG CTG GAG ATC AAA 
T K I» B I K 



MPS 



GAT TCG TAT~CAG CTG ACG CAG CCT CCC TCC GTG TCT GGG TCT CCT GGA CAG TCG ATC ACC ATC TCC TGC ACT GGA 
DSYQLT3PPSVSGSPGQSITISCTG 



ACC AGC AGT GAT GTT GGG AGT TAT AAC CTT GTC TCC TGG TAC CAA CAG CAC CCA GGC GAA GCC CCC AAA CTC ATC 
TSSDVGSYHLVSWYQQHPGEAPRLI 

,_#****+*****CDR2************ — " • — ~ ~ 

ATT TAT GAG GTC AGT AAG CGG CCC TCA GGG GTT TCT AAT CGC TTC TCT GGC TCC AAG TCT GGC AAC ACG GCC TCC 
IY EVSXRPSGVSNRFSGSKSGMTAS 

> < — 

****************CDR3*************** 

CTG ACA ATC TCT GGG CTC CAG GCC GAG GAC GAG GCT GAA TAT TAC TGC TGC TCA TAT GCA GCT GAT AGC ACT GTG 
LTISGLQAEDEAEYYCCSYAADSTV 



**# 

ATA TTC GGC GGA GGG ACC AAG CTG ACC GTC CTA 
I FGGGTKLTVL 



Fig. 3. cDNA sequences of anti-HLA light chain variable regions. The leader, V H , D and J„ are 
denoted by the dashed line; GDRl, CDR2 and CDR3 are indicated by asterisks, (a) cDNA sequence 
of MPL; (b) cDNA sequence of MP6; (c) cDNA sequence of MP9; (d) cDNA sequence of MP10; (e) 
cDNA sequence of MP12; and (f) cDNA sequence of MP14. The sequences have been assigned 
Genbank accession numbers L38432, L38434, L38436, L38426, L38428 and L38430, respectively. 



1114 



J.S. ANDR1S et ai 



(d) mpio 



Wg"^"aCc"c^"^6 CAG CTT CTC TTC CTC CTG CTA CTC TGG CTC CCA GAT ACC ACT GGA GAA ATA GTG ATG ACG 

*************-**cdri******** 

«l^TOTlCTl»riCT"m GGgIaTagTgCc'aCC CTC TCC TGC AGG GCT AGT CAG ACT GTT AGC 
QSPATLSVSPGERATLSCRASgTVb 

*********CDR2.** # * ****** 

AGC* AAC* TTA* GCc"tGG~ TAC~ CAG~CAG~ AAa" CCt" GGc" CAG~ GCT~ CCC* AG6 CTC CTC ATC TAT GGT GCA TCC ACC AGG GCC 
SMLAWVQQKPGQAPRLLIYGASTRA 

*£^Tkk^ « t A " T *f ™ A f A f T ^ G T f 

TGIPARFSGSGSRTEFTLTISSLQS 



♦CDR3* 



GAA GAT TTT GCA GTT TAT TAC TGT CAG CAA TAT TAT AGC TGG CCT CCG CGA CTC ACT TTC GGC GGA GGG ACC AAG 

_ _ _ _ «*«sr*ftnVVRWPPALTFGvG? 



GTG GAG ATC AAA 
V E I K 



(e) MF12 



_*********** 



CAG TCT CAG CTG ACG CAG CCA GCC TCC GCG TCC GGG TCT CCT GGA CAG TCA GTT ACC ATC TCC TGC ACT GGA ACC 
QSQLTQPASASGSFGQ3VTISCTGT 



********** — 



***********CDR1**** ************ 

GGC AGT GAC GTT GGT GGT TAT AAC TAT CTC TCC TGG TAC CAA CAG CAC CCA GGC AAA GCC CCC AAA CTC ATG ATT 
GSDVGGYNYVStfYQQKPGKAPKLMI 



_****• *******Q)R2********'*** 



TAT GAG GTC AGT AAG CGG CCC TCA GGG GTC CCT TAT CGC TTC TCT GGC TCC AAG TCT GGC AAC ACG GCC TCC CTG 
VBVSKRPSGVPYRFSGSKSGHTASL 



*CDR3* 



ACC GTC TCT GGA CTC CGG GCT GAG GAT GAG GCT GAT TAT TAC TGC AGC TCA TAT GCA GGC AAC AAC AAT TTG GTA 
TVSGLRAEDEADYYCSSYAGNHNLV 



TTC GGC GGA GGG ACC AAG GTG ACC GTC CTA 
FGGGTKVTVL 



(f) M*14 

ATG GAA ACC CCA GCG CAG CTT CTC TTC^CTC^CTG CTA CTC TGG CTC CCA GAA AGC ACC GGA GAA ATT GTG TTG ACG 
HBTPAQLLFLLLLWLPESTGEIVLT 

yyQ _ ******* ****** C flm*******+** 

CAG TCT CCA GGC ACC CTG TCT TTG TCT CCA GGG GAA AGA GCC ACC CTC TCC TGC AGG GCC AGT CAG ACT GTT ACC 
QSPGTLSLSPGERATLSCRASQSVT 

******************* •♦****** a >R2+****** 

AGC AGC TAC TTA GCC TGG TAC CAG CAG AAA CCT GGC CAG GCT CCC AGG CTC CTC ATC TTT GGT GCA TCC AGC AGG 
S SYLAWYQQKP GQAP R L L I FGASSR 

******* — — — _ — — — " ~ ~ 

GCC ACT GGC ATC CCA GAC AGG TTC AGT GGC AGT GGG TCT GGG ACA GAC TTC ACT CTC ACC ATC AGC AGA CTG GAG 
ATGIPDRFSGSGSCTDFTLTISRLE 

***************CDR3**************** 

CCT GAA GAT TTT GCA GTG TAT TAC TGT CAG CAC TAT GGT AGG TCT GCG TAC GCT TTT GGC CAG GGG ACC AAG CTG 
? EDFAVYYCOHYGRSAYAFGQGTKI. 

GAG ATC AAA 

e r k 



Fig. 3 — amtinuetf. 



< 
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cell line, MP14, expresses a V k .3 gene segment which is 
97.6% identical to the gennline gene, A27 (Table I; Strau- 
bingeref a/., 1988). 

The comparison between each of the expressed V L 
genes and the closest germline gene is shown in Fig. 4. 
When each of the expressed gene segments is compared 
to a putative germline sequence, the percent nucleotide 
difference ranges from l.l to 4.2% (the differences in the 
beginning of framework 1 in the X genes are likely to be 
due to the degeneracy of the PCR primers and have not 
been considered in the homology analysis)- As seen in 
Table 4, the overall ratio or replacement to silent sub- 
stitutions ranges from 2:1 to 8:1. Similar to the heavy 
chain, this is indicative of antigen-driven selection- In 
general CDR3 exhibits the majority of the changes, while 
CDKl and CDR2 demonstrate fewer than that observed 
in the heavy chain. 

Two of the three K-expressing cell lines appear to have 
additional nucleotides at the V-J junction. It was orig- 
inally thought that length variation in light chains was a 
result of recombination slippage rather than N segment 
addition, as terminal deoxynucleotidyl transferase (TdT) 
activity is negligible at the time of light chain rearrange- 
ment. However, it has been demonstrated thai N segment 
addition can contribute to overall light chain diversity 
(Victor and Capra, 1994). In comparing MP6V K to the 
germline V K 02/012 gene segment, four nucleotides at the 
end of the V gene do not appear to be derived from the 
germline sequence (Fig. 4B). It has been suggested that 
one type of junctional diversity, P nucleotides, can be 
attributed to the transfer of the terminal nucleotide(s) 
from the antisense strand to the sense strand during 
recombination, forming a palindromic template (Lafaille 
et al 7 1989). The first two nucleotides, GG, may be the 
result of P nucleotides, derived from the complementary 
CC at the 3' end of the germline gene. However, the last 
two nucleotides, TA, cannot be attributed to either the V 
gene recombination signal sequence (RSS) or the J gene 
segment. Therefore it is possible that these may be the 
result of TdT activity. MP10V K also exhibits four 
additional nucleotides at the V-J junction (Fig. 4D). The 
first G, again, may be the result of P nucleotide addition, 
derived from the complementary terminal C. However, 
the CGA cannot be attributed to either the V gene seg- 
ment or the J gene segment or their corresponding RSS. 
Again, they may be the result of N segment addition by 
TdT. However, as with the heavy chain variable region, 
one must use caution in assessing the germline origin, 
somatic mutation, and junctional diversity, as one cannot 
be certain of polymorphism and undescribed gennline 
gene segments. 

D and J gene segments 

In the heavy chain variable region the D segment com- 
prises the majority of CDR3 and is believed to be respon- 
sible for much of the antibody specificity. With few 
exceptions, the assessment of D segment utilization in 
human immunoglobulins has been, and continues to be, 
difficult, due to the generally poor homology when com- 



paring expressed D segments and the known gennline D 
segments. Similarly, our D segment analysis has revealed 
limited homology between any of the expressed sequences 
and the known gennline sequences. It has been dem- 
onstrated that D segments can be found expressed in the 
forward and reverse orientations, as well as fused to each 
other in both orientations (Meek et aL, 1989; Sanz, 1991; 
Tuafllon et al.> 1993). One example of a possible D-D 
fusion is found with the MP1 D segment. The expressed 
sequence is homologous to two fetal D gene segments, 
D21-7 (Sanz, 1991) at the 5' end and D21-9 (Sanz, 1991) 
at the 3' end, with flanking nucleotides at both ends which 
do not exhibit significant homology to any germline 
sequence (comparison not shown). The first 15 nucle- 
otides of the MP1 2 D segment are identical to the germ- 
line DXP4 gene segment (Sanz, 1991), leaving 10 
nucleotides without significant homology to any other 
known D gene (comparison not shown). The profile of 
this stretch of nucleotides is uncharacteristic of N seg- 
ment addition, as it is A/T-rich, therefore it may represent 
an undescribed germline sequence. Although the ex- 
pressed D segments range in size from nine to 42 nucleo- 
tides, no other significant homologies were found- 
In addition, without a definite germline D segment 
donor and V gene flanking sequence, it becomes difficult 
to analyse the derivation of individual nucleotides. Many 
of the known germline V gene segments have been 
reported in the literature as PCR-generated fragments 
which do not include any Banking sequence that may 
become part of the expressed sequence during rearr- 
agement In the case of MP6, the first two nucleotides of 
the defined D segment (GA) may have derived from the 
two nucleotides that are present in the germline sequence 
between the end of the coding sequence and the RSS. 
There does not appear to be any P nucleotide addition at 
either the V-D or the D-J junction in any of the heavy 
chain variable regions. 

Among these antibodies, one expresses J H 2, one 
expresses Jh4, two express J H 3 and two express J„5. This 
differs from previous studies where J H 4 has been shown 
to predominate in the normal adult antibody repertoire, 
while J H 3 and J H 5 are significantly decreased (Table 1; 
Tuaillon et al>. 1993; Yamada et a/., 1991). 

With the exception of MP6, all of the antibodies exhibit 
signs of exonuclease activity at the 5' end of the J H 
segment, i.e. the homology between the expressed J H and 
the germline J H begins approximately three to five nucle- 
otides from the 5' end of the germline sequence. All of the 
expressed J H gene segments display nucleotide differences 
when compared to the corresponding germline sequences. 

In the A-expressing clones, all three utilize the J x 3 gene 
segment. This finding is similar to our previous studies in 
which Ja2 and .1^3 predominate among X light chains. In 
contrast to the heavy chain J segments, the expressed 
Ji genes do not appear to have undergone exonuclease 
digestion. In addition, MP1 J* is completely gcrmUne, 
while MP9 and MP12 have only one nucleotide difference 
each, compared to the gennline J A 3. In the K-expressing 
clones, one expresses J K 4 and two express Jk2. As with 
the A light chains, three of the k light chains exhibit no 



(a} viii. 1 

MP1VL 



VIII.l 
HP1VL 



VIII.l 
HPIVL 



VIII.l 
MP1VL 



<b) VK02/012 
MP6VK 



VK02/012 
MP6VK 



VK02/012 
KP6VK 



VK02/012 
MP6VK 



(c) DPL10 
MP9VL 



DPLlO 
MP9VL 



* * * «•* ******* *CDR1 * * *** * 

TCCTOTGAWTGACTCAGCCACCCT^ 

CAGTC.C TG 



*CDR2* 



MAXKOTGCTOTATCAGCAG^^ 



GATTCTCTG^CCAACTCTGGGAACA^ 



♦**CDR3******* 
GGGACAGCAGCACT 
, -CT- 



**CDR1* 



GACWTCCJVGATGACCCACTCTCCATCCTCCCTGTCTGC^ 



*********CDR2* 



AGCTA?TTAAATTGGTATC*(^^ 

****** 

AGGTTCA(n^GCAGTGGATCTG<3G»CASVOT 



****CDR3** ************ 

AGTTACAGTACCCCCT CC CICAGTG 

A. GGT. 

****** **#*CDR1******* 

CACTCTGCCCTGACTCAGCCTGCCTCCGTGTCTGGGT 
GRTTC. .A.CAG G C 



************** ****+***cDR2*** ****** 

GGGAGTTOTAACCTTGTCTCCTGCT 

c C ...T 



DPiao 

HP9VL 



DPL10 
MP9VL 



(d) Lie 

KP10VK 



L16 
MPIOVK 



GTTTCTAATCGOTCTCTGGCTCC^^ 

C A 



********** CDH3* ******** ***** 

TGCTGCTCOTATGCAGGTAGTAGCACTTrC 
C.GA 




AGCAACTOUSCCTGCTAC^ 



L16 

KPIOVK 



LI 6 

KPIOVK 



AGGTTCAGTGGCACT(MGTCTGGGACAGAGTTC 



*********CDW******** 
TATAATAACTGGCCTCC— CadfiTC 
. . .T. . .G GCG. 



*CDKL*** 



(e) halv2D46 CACTCTGCCCTGACTCAGCCTCCCTCC 

MP12VL CAG G AG - T C 

*+*************-**• *********CDR2******** 
ha lv204$ GTTAIAACTATGTCTCCTGGTACCAACAGCACCC^ 

MP12VL 



hslv2046 TGATC(KrrTCTCTGGCT(XAACT^ 

MP12VL .T A. ...G • 

********CDR3 ********** 

hslv2046 TCATATGCAGGCAGCAACAATG 

HP12VL A T 

Fig. 4. Comparisoo of the nucleotide sequences of anti-HLA light chain variable regions with the 
closest germline gene segment Identity between the sequences is indicated by the periods; CDRL 
CDR2 and CDR3 are denoted by the asterisks; the RSS and additional, non-coding germline 
nucleotides are shown in italic- (a) MP1 VL vs VIII.1 (Convbriato and Klobeck, 1 991 ); (b) MP6VL vs 
02/012 (Pargent et at., 1991); (c) MP9VL vs DPL10 (Williams and Winter, 1993); (d) MP10VL vs 
L16 (Huber et a/-, 1993); (e> MP12VL vs hslv2046 (Irigoyen et al., 1994); and (f) MP14VL vs A27 

(Slraubinger et aU 1988). 



(f) VKA27 
MP14VK 



VKA27 
MP14VK 



VKA2T 
MP14VK 



VKA27 
MP14VK 



GAAATTGTGTT' 
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GCAGGGCCAffTCAjGAGTGTTAGC 



*** 

*■+*****■ * * * * * 

CAGTRTGGTAGCTCRCCT CmrlffT g 
..C G. *TG. G 

Fig. 4 — continued. 



signs of exonuclease activity at the 5' end of the J segment. 
Two of these three J K segments are germline (MP6 and 
MP14) and the third has only one nucleotide difference 
compared to the germJine sequence (MP10). 

DISCUSSION 

Currently, the data regarding the repertoire of variable 
region gene segments utilized in response to alioantigens 
is limited to a single study involving a substantial group 
of monoclonal antibodies specific for a number of struc- 
turally distinct, human blood group antigens (Thompson 
ei uU 1991). It was found that a significant proportion 
of these antibodies utilized a single gene segment, V4-34 
(Vh4~21). To further examine the alioantibody reper- 
toire, we have sequenced, at the nucleotide level, the 
heavy and light chain variable region gene segments 
expressed by six human IgM monoclonal aUoantibodies 
which are specific for various HLA class I and class II 
molecules. From these results we conclude the following: 
(1) the distribution of V H and V L gene families reflects 
what has been described previously, i.e. a predominance 
of the V H 3 gene family, and to a lesser extent, the 
and Va2 gene families; (2) there is not a restriction in the 
expression of individual gene segments; and (3) in general, 
there is minimal apparent somatic mutation, compared 
to our previous studies describing antibodies to a variety 
of other exogenous antigens. 

As mentioned previously, due primarily to the preva- 
lence of V region CRI-association with different auto- 
immune diseases, it has been hypothesized that different 
gene segments are maintained in the germline for use in 



different types of responses. The interpretation of the 
data regarding the charactexuation of variable region 
gene segment usage in a number of different responses to 
both exogenous and endogenous antigens would suggest 
that the majority of germline variable region gene seg- 
ments can be utilized for virtually any antigen specincity 
(for references, see introduction). One exception may be 
the V4-34 (V H 4-21) gene segment (Pascual and Capra, 
1992). This gene segment has been shown to encode the 
CRL, 9G4, expressed on human cold agglutinins with the 
l/i specificity (Stevenson ex al, 1986). Although this gene 
segment is highly represented in fetal and adult lymphoid 
tissues, it is found encoding only 0.2 and 0.6% of normal 
serum IgM and IgG protein, respectively (Pascual and 
Capra, 1992). Thus far, V4-34 has only been found to 
be utilized in the autoantibody and anti-blood group 
alioantibody pools, presumably representing some frac- 
tion of the aforementioned serum IgM and IgG. It has 
only been found in one of 30-40 antibodies specific for 
other exogenous antigens (Bogard ei a/., 1 993). In a study 
of a large group of human alloantibodies reactive to a 
diverse array of structurally distinct blood group anti- 
gens, V4-34 was found to be expressed by 64 % of IgM 
and 21% of IgG expressing clones (Thompson et a/., 
1991). Since the panel of antigen specificities was diverse, 
both in number and structure, this suggested that V4-34 
may be responsible for a significant portion of alioan- 
tibody responses. The current study indicates that a num- 
ber of different gene segments are recruited in response to 
other alioantigens, consequently, the previously observed 
usage of V4-34 in anti-blood group antibodies may be 
the result of something inherent to the structure of the 



Table 4. Summary of nucleotide differences between expressed and germline V u gene segments in anb-HLA antibodies 



FRI CDR1 FR2 CDR2 FR3 CDR3 



Cell line 


R/S 


% MUT 


R/S 


%MUT 


R/S 


%MUT 


R/S 


%MUT 


R/S 


%MUT 


R/S 


%MUT 


MP1 


0/0 


0.0% 


3/0 


9.1% 


3/1 


8.9% 


0/2 


9.5% 


0/1 


1.0% 


2/0 


9.5% 


MP6 


0/0 


0.0% 


0/0 


0.0% 


0/0 


0.0% 


0/0 


0.0% 


0/0 


0.0% 


3/1 


16.7% 


MP9 


1/0 


1.4% 


0/0 


0.0% 


2/0 


4.4% 


1/0 


4.8% 


1/1 


2.1% 


3/0 


12.5% 


MP10 


0/0 


0.0% 


1/1 


6.1% 


0/0 


0.0% 


0/0 


0.0% 


1/0 


1.0% 


2/1 


14.3% 


MP12 


0/1 


1.4% 


V0 


2.4% 


0/0 


0.0% 


0/0 


0.0% 


2/1 


3.1% 


1/0 


4.2% 


MP14 


0/0 


0.0% 


1/0 


2.B% 


1/0 


2.2% 


0/0 


0.0% 


0/0 


0.0% 


3/2 


23.8% 
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red blood cell itself or the structure of its antigens that is 
currently unknown. The results of this study show that 
the distribution or variable region gene families and the 
ratio of k and X light chain-bearing antibodies generally 
reflect the findings of others, i.e. a predominance of the 
V H 3 heavy chain gene family and the V,.3 and V ; 2 light 
chain gene families. However, due to the sample size and 
the diverse array of specificities, no correlations can be 
made between V genes and HLA polymorphisms. 

Unlike our previous studies, none of the antibody V 
regions described here express gene segments derived 
from single (or two) member families, therefore it is 
difficult to precisely determine the extent of somatic 
mutation. Although there is now extensive germline 
sequence data (Matsuda et aL, 1993; Cook ei ai„ 1994). 
the true extent or polymorphism in individual gene seg- 
ments is still unknown, thereby allowing only assessment 
of apparent somatic mutation by comparing expressed 
sequences with a given gennline or rearranged gene seg- 
ment that displays the most significant nucleotide 
homology. This analysis suggests that these expressed 
heavy and light chain variable regions are somewhat less 
mutated than many of the previously described 
sequences, some of which exhibited as much as 15% 
disparity from the closest germline gene segment ( Andris 
ei aL 1991). This observation may be reflective of two 
recent studies of the extent of somatic mutation in differ- 
ent subsets of B lymphocytes representing various stages 
of maturation (Pascual ei at . 1994; Kuppers et aL 1993). 
It was found that the lgM + , IgD + naive B cells express 
variable region gene segments that are germline in origin, 
with no detectable y transcripts, whereas germinal center 
(antigen-stimulated) B cells fall into two groups with 
regard to somatic mutation. The first group contains 
those cells which still express lgM and exhibit low to 
moderate somatic mutation, while the second group 
includes the IgG-expjessing transcripts that have a more 
extensive number of mutations. The lgM alloantibodies 
described in this study may be representative of the IgM 
antibodies in the germinal center, while previously 
described IgG antibodies may represent the second group 
of germinal center cells that have undergone isotype 
switching and more extensive mutation and selection. 
More precisely, they may represent B lymphocytes from 
the memory cell pool, as the length of time from the last 
immunization to the time of sampling ranges from i to 6 
years. This assessment still fits the scheme scL forth by 
Pascual et aL as they describe the TgM-expressing mem- 
ory cell pool to exhibit an average of 3-6 nucleotide 
substitutions. Although lgG-bearing cells appear to pre- 
dominate in the memory cell population, the IgM positive 
memory cells may represent those cells that have not 
undergone extensive somatic mutation and class switch- 
ing, both of which may require a more prolonged 
exposure to antigen. Alternatively, these cells may have 
been sampled from the pool of newly generated cells, and 
therefore, represent IgM + cells that have been recently 
recruited, rather than recirculating memory cells. 

Nonetheless, taken as a whole, the nucleotide differ- 
ences observed in these anti-HLA antibodies are charac- 



teristic of cells which have undergone some amount of 
somatic hy permutation and antigenic selection. Both the 
heavy and light chain sequences exhibit a ratio of replace- 
ment to silent substitutions that range from 2:1 to 8:1 
and fall mainly within the CDRs. There are, however, a 
significant number of replacement substitutions in FRs 
as well, an observation that has been documented pre- 
viously, suggesting either a direct or indirect role for these 
residues in antigen binding. On the other hand, silent 
substitutions may either be the result of random 
mutational events introduced by the somatic hyp- 
ermutation machinery ot may represent polymorphism in 
a subfamily of germline gene segments. Without germline 
gene information from the original donors of each of 
these cell lines, these possibilities cannot be precisely 
assessed. 

Not only do these cell lines provide an opportunity to 
examine a pan of the human alloantibody repertoire, 
they are a source of HLA-typing reagents. Although 
much of the HLA-typing is currently performed using 
PCR-based methods, a large portion still depends on 
serological methods and, therefore, a panel of reagents 
which can distinguish individual polymorphisms within 
a subgroup of molecules. There are a number of murine 
monoclonal antibodies that recogni ze human HLA mol- 
ecules, but often they do not distinguish polymorphic 
determinants. Therefore it is necessary to obtain alioan- 
tiscra. Unfortunately there are drawbacks to altoantisera 
as well, such as consistently low titers of antibody and 
the need to absorb with platelets to remove anti-class 
I antibodies. The development of human monoclonal 
antibodies eliminates some of these problems, but often 
others remain,, like low production of antibody and gen- 
etic instability of the antibody-producing cell lines. 

Despite the disadvantages associated with the use of 
alloantisera. the epitopes defined by most human anti- 
HLA monoclonal antibodies represent polymorphic 
structures reflecting the allelic variation within the HLA 
system. Therefore, human monoclonal antibodies may 
aid in understanding the molecular mechanism of pro- 
cesses in which HLA polymorphism plays a critical role, 
such as graft rejection, antigen presentation, and sus- 
ceptibility to certain HLA-associated autoimmune 
diseases, including insulin-dependent diabetes, celiac dis- 
ease and rheumatoid arthritis. They can also serve as 
useful HLA-typing reagents for organ transplantation 
and molecular mapping of the allo-HLA polymorphic 
epitopes that are involved in antibody and T lymphocyte 
recognition. Nucleotide sequences of a large number of 
HLA class I and class II alleles are now available which 
aid in the definition of such epitopes and their location 
in the three-dimensional structure of HLA molecules. 
Since certain class II molecules may predispose some 
individuals to the development of particular autoimmune 
diseases, human anti-HLA monoclonal antibodies could 
provide a means of identification of the most important 
functional epitopes associated with disease susceptibility. 
In addition, human anLi-HLA moooclonals may have 
therapeutical application in autoimmune disorders, as 
has been successfully achieved in animal models (Adel- 
man ef aL 1983). 
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Human immunoglobulin repertoire studies continue to 
expand and currently include the structural analysis of 
the V« and V L gene segments expressed In B cells that 
have been exposed to a large number of different antigens. 
In general, it is apparent that human antibodies, regard- 
less of their specificity or disease of origin, can be derived 
from a substantial number of different germline gene 
segments which are (likely) subjected to somatic hyp- 
ennutation and clonal selection. However, the issue of 
biased gene usage remains to be more accurately denned. 

CONCLUSIONS 

In conclusion, we have cloned and sequenced the heavy 
and light chain variable region gene segments utilized by 
six human monoclonal alloantibodies specific for various 
human class I and class EL HLA molecules. Analysis 
of these expressed sequences revealed the predominant 
utilization of V H 3 gene segments (four of six) with one 
V H 1 and one Vh4. Three of the light chains are k and 
three are L The light chain V segments were shown to be 
derived from several gene families including V^2, V A 3, 
V r 1 and V„3. We conclude that the distribution of V p and 
V L gene families reflects the expected level of expression 
based on the size of the family. In addition, these expres- 
sed sequences appear to be derived from a diverse array 
of germline gene segments, indicating that there is not a 
restriction at the level of individual gene segments. Third, 
there is minimal apparent somatic mutation, compared 
to previous studies describing antibodies tD a variety of 
other exogenous antigens. However, the mutation that is 
present appears to have been antigen driven, as the ratio 
of replacement to silent substitutions ranges from 2:1 to 
8:1. 
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INTRODUCTION 

The production of combinatorial antibody libraries in bac- 
teria is based on the efficient cloning of active immunoglobu- 
lin genes into bacterial expression vectors. This is usually 
achieved by using the polymerase chain reaction (PCR) to 
amplify complementary DNA (cDNA) of the active immuno- 
globulin genes with primers containing restriction sites which 
enables directional cloning into bacterial expression vectors 
[1-5]. The complexity and representation of the library 
achieved depends largely on the ability of the primers used 
to amplify a broad spectrum of immunoglobulin variable 
region genes. Increased representation within a library can 
also be achieved by increasing the number of classes of 
immunoglobulin genes included. In designing primers for 
repertoire cloning, we decided to examine the specificity of 
PCR primers used for the amplification of human active 
human inununoglobulin V-region genes, and to extend the 
number of immunoglobulin classes included. In order to be 
compatible with existing Fab phage display vectors [2, 3], 
primers were designed to amplify the major V* and V A gene 
families as intact light chains, and to amplify V H gene families 
as Fd fragments from IgG, IgM and IgA. To test these 
primers we amplified and cloned several light 'and heavy 
chain immunoglobulin genes from clonal populations of B- 
cells. As a source of clonal B-cell populations to test our 
primers, we selected B-ceil lines from a Burkitt's lymphoma 



(Daudi ) [6], two multiple myelomas (RPMI 8226 [7] and IM- 
9 [8]), an undifferentiated B-cell lymphoma (MCI 16) [9] and 
in vitro Epstein-Barr Virus (EBV) transformed B-cells 
(Dakiki) [10]. 

MATERIALS AND METHODS 

Cell cidture. Cell lines RPMI 8226 (CCL 155), IM-9 (CCL 159), 
Daudi (CCL 213), Dakiki (TIB 206) and MCI 16 (CRL 1649) were 
obtained from ATCC (RockviUe, MD, USA) and grown in RPMI 
1640, supplemented with 25/ig/ml Gentamicin, 2m*d L-GluUrnine 
and 20% FBS. Each cell line was grown until the number of cells 
exceeded 10 B , al which point they were harvested, washed once in 
PBS, and then used for the preparation of RNA 

RNA extraction and cDNA synthesis. Total cellular RNA was 
extracted using standard techniques [11], Briefly, 10 B — 10 9 cells from 
each cell line were dissolved by homogenization in 10 ml 4M 
guanidine thiocyanate supplemented with 0.1% sarcosine and 80 /J 
beta-mercaptoethanoL DNA was then sheared by 10 passes through 
a 21 gauge needle, followed by 10 passes through a 25 gauge needle. 
This cellular homogenate was then loaded onto a cesium chloride 
cushion (density = 1 .62 g/ml), and centrifuged at 300 000 £ for 16 h/ 
20°C in a Beokman Ti50 rotor. Total RNA was recovered as a pellet, 
washed with ethanol, resuspended in 500/xl DEPC- treated water, 
precipitated with 50 /d of 3M sodium acetate and I ml of ethanoi, 
resuspended in a final volume of 300 /xl and quantified by absorbance 
at 260 nm. Typical yields were 1-4/ig/jd for a total of 300 Mg to 
l.2mg from I0 a cells. 5-25 fig of total RNA was denatured by 
incubation at room temperature (23°C) for 5min with 1 mM methyl 
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Table 1. Primers used for PCR amplification of human immunoglobulin genes 
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Chain primer Sequence Specificity 



la ACA TGT GAGCTC CAG ATG ACC CAG TCT CC V k1 &V k4 

lb CAG TGG GAGCTC GTG ATG ACT CAG TCT CC V^&V^ 

1c ACC GGA GAGCTC GTG TTG ACG* CAG TCT CC V^&V^ 

2 GCG CCG TCTAGA ACT AAC ACT CTC CCC TGT TGA ACC TCT TTG TGA CGG GCG AAC TCA G 

Light 3a GCG ATC GAGCTC TCT GTG CTG ACT CAG CC V*, 

3b TCC TGG GAGCTC TCT GCC CTG ACT CAG CC 

3c TCT GTG GAGCTC TAT GTG CTG ACT CAG CC Vjj 

3d TCT GTG GAGCTC TCT GAG CTC ACT CAG GA V M 

3e TCC AAT GAGCTC ACT GTG GTG ACT CAG GA V x7 

4 GCG CCG TCTAGA CTA AGA ACA TTC TGC AGG GGC CA Q x 

5a G GTC CTG CTCGAG GTG CAG CTG GTG CAG TCT GG V m &Vn5 

5b G GTC CTG CTCGAG GTG CAG CTG CAG GAG TCG GG V H4 

5c GTC CTG CTCGAG GTG CAG CTG GTG GAG TCT GG Vhj 

5d GTC CTG CTCGAG GTC ACC TTG AAG GAG TCT GG V H2 

5e G GTC CTG CTCGAG GTG CAG CTA CAG CAG TGG GG V H4 J 21 

Heavy 5f T GTC CTG CTCGAG GTA CAG CTG CAG CAG TCA GG V H6 

6a CAG AGT ACTAGT CTT GTC CAC CTT GGT CTT GCT IgG 

6b GTG AGT ACTAGT ACA AGA TTT GGG CTC AAC T IgG, 

6c CTC AGC ACTAGT TGG TAG AGG CAC GTT CTT TT IgM 

6d CAG AGT ACTAGT TGG GCA GGG CAC AGT CAC AT IeA 



Restriction sites inserted for cloning are underlined. 

mercury hydroxide. The RNA was then put on ice, mixed with the 
remaining reagents, then incubated for 1 h at 42°C. The reaction 
conditions were: 35 mM beta-mercaptoethanol; I dim each dNTP; 
0.1 fig/fA random hexamers; 10 mM Tris HQ pH 8.8: 50 mM K.C1; 6.5 
mM MgCl 2 ; 0.01% Gelatin; 500 units/mL RNasin; 2 units/^1 AMV 
RT. cDNA was used directly for PCR. 

Polymerase chain reaction. Reaction volumes were 100 or 200 fd 
using 5 or 10 /il of a cDNA reaction for each PCR reaction. 
Conditions for PCR were JDbim Tris-HCl pH 8.8; 50mM KQ; 
1.5mM MgCl* 0.01% gelatin, 160/£M each dNTP, 2.5 jiM each 
primer. Reaction parameters were 94°C/lmin; 45°C/2rain; 72°C/ 
1 min, 40 cycles. The final cycle was followed by a 7 min/72°C primer 
extension phase. 

The primers used for the PCR amplification of human immuno- 
globulin genes are shown in Table I. Primers la (V K| and V,.*), lb 
(V«2 and V,^) or 1c (V,^ and V k5 ) were used with primer 2 in separate 
reactions for the amplification of full length kappa chains. Primer 2 
was designed to incorporate a C to T mutation to eliminate a 
naturally occurring Sac-I site in human C K gene. Primers 3a(V xl ), 
3b( V X2 ), 3c( Vju), 3d( V M ) and 3e( V X7 ) were used with primer 4 to 
amplify full-length lambda immunoglobulin light chains. 

Amplification of heavy chain Fd fragments was achieved using 
primers 5a through Sf with primers 6a (71-4), 6b (71), 6c fa) or 6d (ai 
and qJ. Primer 5a was designed to recognize group I heavy chain 
variable (V H i) domains, but due to the homology between groups 
V Hl and V H5 , should also recognize V H s genes. Primer 5b was 
designed to amplify V H2 genes and primer 5c to recognize V H3 
genes. Primers 5d and 5e were designed to amplify the Vha family 
of genes, primer 5e being designed to recognize the V H *_ 2 i group of 
germline sequences. Primer 5f was designed to recognize sequences 
derived from the single V H6 germline sequence. 



Cloning. PCR products were separated by electrophoresis through 
a 1.5% agarose gel. Bands at approximately 650 bp were isolated 
using with DEAE paper [11], then digested with the appropriate 
restriction enzymes and cloned into pBluescript-II KS- (Stratagene, 
La Jofia, CA, USA) using standard techniques [11]. Clones contain- 
ing inserts were sequenced using standard drdeoxy techniques [1 1]. 
The heavy chain Fd fragment derived from cell line Dalriki was 
initially refractory to cloning. Post restriction digestion analysis of 
the PCR product revealed bands of approximately 500bp and 150bp. 
This Fd fragment was then cloned in two parts, as an Xho-I/Spe-I 
fragment of J40bp, and as an Spe-I/Spe-I fragment of 5i0bp, 
demonstrating that the additional Spe-I site was in the V H region, 
which was confirmed by sequencing. 

Database homology searches. The sequences described in this 
report were compared with the December 1994 update of genbank 
and EMBL databases. Genbank has assigned the following accession 
to the sequences described in this report: Cell line IM-9; IgG Fd: 
U07985; k chain: U07989; Cell line RPMI 8226; A Chain: U07992. 
Cell line Dakiki, IgA Fd: U07986. Cell line Daudi; IgM Fd: U07987, 
* chain: U07990. Cell line MCI 16; IgM Fd: U07988, X chain: 
U07991. 



RESULTS 

Imrmmoglobulin light chains 

We were able to amplify four active light chain genes from the 
5 B-cell lines tested, the sequences of which are shown in 
Table 2. Kappa light chains were amplified from cell lines 
Daudi and IM-9 with the V ftI primer alone, and sequencing 
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confirmed that both V-region genes were from the V Ki family. 
PGR amplification of lambda light chains using cDNA from 
MCI 16 RNA as a template gave positive reactions for V^i 
and Va3, with the reaction for V M being considerably stronger 
than that for V^. Using cDNA from RPMI 8226 RNA as a 
template, positive reactions were obtained with primers for 
V A , and V^2, with the reaction for V A , being stronger than 
that for V^. Sequence analysis of both sequences revealed 
that they belong to the family. The greater amplification 
using a V M primer was therefore surprising and we failed to 
amplify immunoglobulin light chain from cDNA derived 
from Dakiki RNA. A positive PGR signal was obtained 
with primers designed to amplify only the CI region and a 
positive signal corresponding to a full length lambda tran- 
script was obtained on a northern blot probed with a C* 
probe. Additional primers were synthesized corresponding to 
variants of V AI and V A4 , but these also failed to amplify the 
full length lambda transcript. We conclude from these obser- 
vations that the lambda light chain expressed in cell line 
Dakiki is significantly different at the site of our 5' V A PCR 
primers than any of the primers tested thus far. 

The Daudi kappa chain contains a J kA sequence and the 
IM-9 light chain uses a J Kl J-region. Cell lines MC-1 16 and 
RPM18226 both utilize J>2 derived J-regions and C A2 con- 
stant regions. 

Comparison of the nucleotide sequences showed that the 
sequence for the Daudi V K region differed from the previously 
reported sequence [12] at three nucleotides, one of which 
resulted in an amino acid change (a V to D change at a.a. 56). 
We have sequenced three clones of the Daudi V R from three 
separate cDNA syntheses; all three clones have the same 
sequence, indicating that the observed divergence from the 
previously reported sequence was real. It is not known 
whether these differences reflect sequencing error in the 
original report, or on-going somatic mutation in the cell line. 

The Daudi V Kl gene shows the greatest homology to 
germline gene Lll [13], with which it was only 84.6% 
identical. Comparison with rearranged V Kl genes indicated 
that the Daudi V Kl region shared less than 90% with all 
sequences on the database. The IM-9 V Kl gene was 96.3% 
identical to germline V Kl gene L12 [14], and 94.9% identical 
to germline gene HJ02 [15]. The MCI 16 V A region was 
highly homologous (89-96%) to many rearranged 
genes, but was most closely related to one germline 
sequence (DPL13) [16], with which it shared 96% homology. 

The RPMI8226 gene showed 91.4% identity to the 
same germline gene to which the gene cloned from 
cell line MCI 16 (DPL13) [16] was most closely related. Given 
the limited homology of the RPMI8226 gene to DPL1 3, it 
is possible that this gene is related to a previously undescribed 
germline gene. The RPMI8226 Vj^ gene was, however, 
completely identical to a rearranged gene ( V A 00I) [17]. It 
is remarkable that two rearranged genes from separate B-cell 
lines should have identical V Ar J A2 sequences, particularly 
when the V\ sequence concerned has diverged from the 



nearest known germline gene by almost 9% (25 altered 
nucleotides), possibly indicating cross-contamination of the 
two cell lines. 

Immunoglobulin heavy chaitis 

Four immunoglobulin heavy chain Fd fragments were suc- 
cessfully amplified and cloned, the sequences of which are 
shown in Table 3. An IgGi Fd fragment was cloned from IM- 
9, an IgA! Fd fragment was cloned from Dakiki, and IgM Fd 
fragments were cloned from cell lines Daudi and MCI 16. No 
attempt was made to amplify Fd fragments from cell line 
RPMI 8226 on the basis of previous reports of light chain 
expression only [7]. The active VH domains from cell lines 
Daudi, Dakiki and IM-9 are from the V H3 family, while that 
from cell line MCI 16 is from the V H1 family. Under the 
conditions employed, there was little cross-recognition of V H 
regions by primers designed to amplify difFerent V H regions. 
Using cDNA derived from Dakiki, we obtained specific 
amplification with primers 5c and 6d (Vm/IsA), using 
cDNA from IM-9, amplification was only observed with 
primers 5c and 6a (V H jAeG), and using cDNA derived 
from Daudi amplification was only seen with primers 5c 
and 6c (VKu/IgM). Only in the case of MCI 16 was some 
reactivity seen with primers 5c and 6d on a V H i template, in 
addition to amplification with primers 5a and 6d. Primers 6a 
and 6b both amplified an IgGl Fd fragment from IM-9 
derived cDNA with primer 5c. Both PCR products were 
cloned and sequenced, and both were identical, with the 
exception of the 3' end where the clone obtained using 
primer 6b contained the N-terminal part of the IgGi hinge 
as expected. Analysis of J-gene usage snowed that three of the 
four heavy chains cloned (Daudi, Dakiki and MCI 16) con- 
tained J H4 related J-regions, while the IM-9 heavy chain 
contained a Jhs related sequence. 

Th e V H 3 gene from cell line Daudi was most closely related 
to human germline gene DP53 [18]/HU [19], showing 84% 
homology, but was almost as related (80-82% homologous) 
to germline genes DP87 [20], DP58 [18], HHG19 [21] and 
DP54 [18], as well as numerous rearranged genes. The V H3 
gene cloned from IM-9 was 94.6% identical to germline gene 
DP31 [18], and not closely related to any other germline V H 
gene. The Dakiki V H 3 gene was 93% homologous to three 
germline genes DP47 [18], V-B19.7 [22] and VH26 [23], and 
was more than 90% homologous to many rearranged V H 
gene sequences. The V H i gene cloned from MCI 16 was 
closely related to several germline genes: it showed 97.3% 
and 96.9% identity to germline genes DP75 and DP8, respec- 
tively [18], and more than 90% identical to two other germ- 
line genes, as well as five rearranged genes. 

DISCUSSION 

We have designed oligonucleotide PCR primers which allow 
amplification of human kappa and lambda light chains, and 
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alpha, gamma and mu heavy chain Fd fragments. The kappa 
light chain primers were efficient in amplifying V K j containing 
kappa chain genes, and are similar to previously described 
primers [5, 24]. We were able to amplify two of the three 
lambda light chains we attempted, but not necessarily with 
the primers predicted, indicating that further work may be 
required to optimize the site and/or design of primers. 
Amplification of heavy chain Fd fragments was specific, both 
for class of heavy chain and for family of Vh region. We had 
tried without success to amplify the active heavy chain Fd 
fragments with previously described Vh region primers [5], 
and arrived at the present V H region sequences by trial and 
error. The strategy used in the placement of the Xho-1 site, 
and the overall Vh region primer sequences are similar, 
although not identical, to recently published Vh region 
primers [25]. 

The primers we described have been designed to be com- 
patible with previously described vectors for the display of 
Fab fragments on the surface of filamentous phage. While we 
have improved the primers available to work with these 
vectors and expanded the number available to include both 
IgM and IgA, there are some additional questions that have 
arisen. Specifically, the discovery of an internal Spe-I restric- 
tion site in one of four heavy chain variable region genes we 
cloned clearly indicates that any Spe-I site containing genes 
would clearly not be represented in an antibody library 
constructed using Spe-I. This, in turn, suggests that the use 
of Spe-1 for antibody library construction is less than ideal. 
An additional question exists regarding the feasibility of 
cloning of IgM Fd genes and their expression as Fab frag- 
ments, due to the lack of a simple hinge in IgM. While it has 
been demonstrated that such a molecule can be expressed 
using a flexible linker to join the heavy and light chains [26], it 
may also be possible to modify the IgM reverse primer (6c in 
Table 1) to encode the sequence PCP (as with the IgA primer 
6d) in place of PLP, thereby allowing the Fd fragment to 
covalently link to the light chain. We have tested such a 
primer and found it to amplify IgM Fd fragments with the 
same efficiency as primer 6c. 

From the immunoglobulin sequences that we obtained 
from these B-cell lines, there are clearly differences in the 
extent tp which the different genes have varied from known 
germline genes. We were also able to compare the extent of 
variation of Vh and V L genes from the same B-cell line for 
three of the cell lines we used. The genes from MCI 16 
(derived from an undifferentiated lymphoma) were closest 
to germline sequences. The V H gene was closely related 
(97.3% and 96.9% identical) to two V H i germline genes, 
and the V x gene was 96% homologous to a germline V A 
gene. Similarly, the V-region genes from multiple myeloma- 
derived cell-line IM-9 were 94.6% (V H3 ) and 96.3% (V K i) 
homologous to the nearest germline genes. In contrast, the V 
region gene sequences from Burkitfs lymphoma derived cell 
line Daudi were highly divergent from both germline and 
rearranged gene sequences. Both V H3 and V Kl genes were 



approximately 84% homologous to the nearest germline 
sequences, and almost equally distant from the most homo- 
logous rearranged genes. This lack of homology could indi- 
cate extensive somatic mutation, which would be unusual for 
a cell that had not undergone the class switching from IgM to 
IgG (or IgA) normally thought to be associated with somatic 
mutation of active immunoglobulin genes. A recent report 
has demonstrated ongoing intraclonal variation within a 
follicular lymphoma expressing an IgM immunoglobulin 
[27], thought to be due to ongoing somatic mutation driven 
by antigen. Although no such observation has been made in 
Burkitt's lymphoma, it is clearly possible for extensive 
somatic mutation to occur in the absence of class switching. 
A possible alternative explanation is that the germline genes 
from which the Daudi sequences were derived have not yet 
been characterized. 

Overall, the sequences derived from these B-cell malignan- 
cies support previous observations suggesting that Burkitt's 
lymphomas are derived through malaria-parasite-antigen 
driven mechanisms [28] and that VDJ regions from multiple 
myelomas contain somatic mutations [29]. Our observations 
on the active immunoglobulin genes of MCI 16 suggest that 
undifferentiated B-cell lymphomas have limited somatic 
mutation of their immunoglobuUn genes, which would 
argue against an antigen driven mechanism of transformation 
in these tumors. The extent to which both V H and V L genes 
show the same degree of divergence from the nearest germline 
V-region genes suggests that the mechanisms of somatic 
mutation that give rise to these changes act in parallel on 
both heavy and light chain variable region genes. 
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